I agree one should try new things and relentlessly test one's own dogmas. But regarding TDD, one study of "junior and senior computer science students" (described in an article I have to pay $19 to read) doesn't make somebody an expert on anything relating to how experienced professional programmers handle code bases that last years to decades.
You can't just say "This is bullshit 'cuz they're two students who don't know crap about the real world". Any John Doe can come and affirm whatever he wants on any matter, and as long as he has proper evidence to support his claims, you should not discard what he comse up with based only on the fact that he's a nobody, a student, the President or Donald Knuth. The only thing that matters is the evidence John Doe brings.
But further than that, truth is they are not the first one to come up with those results. Following is a blatant selection of near verbatim quotes from Code Complete (the author makes a great job covering the subject).
"Microsoft's applications division has found that it takes three hours to find and fix a defect by using code inspection, a one-step technique, and 12 hours to find and fix a defect by using testing, a two-step technique (Moore 1992)."
"Collofello and Woodfield reported on a 700,000-line program built by over 400 developers (1989). They found that code reviews were several times as cost-effective as testin - a 1.38 return on investment vs. 0.17."
"[...]the Software Engineering Laboratory found that code reading detected about 80 percent more faults per hour than testing (Basili and Selby 1987). "
"A later study at IBM found that only 3.5 staff hours were needed to find each error when using code inspections, whereas 15-25 hours were needed to find each error through testing (Kaplan 1995)."
Table 20-2. Defect-Detection Rates Removal Step
Lowest Rate Modal Rate Highest Rate
Informal design reviews 25% 35% 40%
Formal design inspections 45% 55% 65%
Informal code reviews 20% 25% 35%
Formal code inspections 45% 60% 70%
Modeling or prototyping 35% 65% 80%
Personal desk-checking of code 20% 40% 60%
Unit test 15% 30% 50%
New function (component) test 20% 30% 35%
Integration test 25% 35% 40%
Regression test 15% 25% 30%
System test 25% 40% 55%
Low-volume beta test (<10 sites) 25% 35% 40%
High-volume beta test (>1,000 sites) 60% 75% 85%
Source: Adapted from Programming Productivity (Jones 1986a),
"Software Defect-Removal Efficiency" (Jones 1996), and
"What We Have Learned About Fighting Defects" (Shull et al. 2002).
Really, I'm not saying that TDD is bad, not at all. What I'm saying is that this sentence :
"one study of "junior and senior computer science students" (described in an article I have to pay $19 to read) doesn't make somebody an expert on anything"
You've missed wpietri's point. He didn't question the study because students wrote it (they didn't). He questioned the generalization from CS students doing a toy exercise to teams maintaining production software.
Your additional citations are irrelevant because "testing" and "TDD" are not the same thing.
Actually you're right, after re-reading both the submitted link and the comment, I see I misunderstood his comment in regard to the context. Please disregard my last post.