Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> If I were King of Science, or at least, editor of a prestigious journal, I'd want to put word out that I'm looking for papers with at least one of some sort of significant effect, or a p value of something like p = 0.0001. Yeah. That's a high bar. I know. That's the point.

And study preregistration to avoid p-hacking and incentivize publishing negative results. And full availability of data, aka "open science".



Preregistration, requirement to publish negative or null results, and full data is, arguably, the three legs of modern science. If we collectively don't enforce this, nobody is doing science, they're just fucking around and writing it down.


I like rules like these. One context where preregistration, null results, and full data are all required are clinical trials overseen by the FDA. It’s no surprise that those studies carry a lot of weight.


Also replication studies for negative or null results in addition to positive ones (we don't have either).


You do realize there is a million negative results for every one positive result? This is equally easy to game, maybe easier.


Yes, and knowing what's been tried and what has failed is important.


I think what's being pointed out is that "researchers" could pump out hundreds of easy to test negatives every day if a negative result was just as incentivised.

I do agree though, negatives are just as important when the intent is to prove/disprove a meaningful hypothosis.


A negative result won't make a career. I don't think there's much danger when requiring negative results going onto a repository of over incentivising negative results. You can't mandate Nature or Cell publishes negative results.


we tried using 0.1 mL, it didn't work

we tried using 0.11 mL, it didn't work

we tried using 0.12 mL, it didn't work

we tried using 0.13 mL, it didn't work


    we tried using 0.10 mL, it didn't work
    we tried using 0.11 mL, it didn't work
    we tried using 0.13 mL, it didn't work
    we tried using 0.15 mL, it didn't work
    we tried using 0.17 mL, it didn't work
    we tried using 0.16 mL, it didn't work
    we tried using 0.18 mL, it didn't work
    we tried using 0.20 mL, it didn't work
    we tried using 0.14 mL, it didn't work
    we tried using 0.12 mL, it worked so we published
Do you want to know the ones that "didn't work" existed? Or are you happy with just the one that "worked" being written up in isolation?


Especially for small effect size and suppressing what didn't work, this is one obvious way of many to perform p-hacking for publication acceptance.

https://en.wikipedia.org/wiki/Replication_crisis


i don’t want to know about each test that didn’t work as a separate publication, that’s for sure!


That's true, we would need a way to collect this data so it's easily seen as part of a whole.

E.g. if you search for eggs and cholesterol you should find all studies with their summarized results on whether eggs are ok or not for your cholesterol, grouped by researcher, so if somebody does 200 studies to find the one positive it's instantly visible.


You would read a meta-study that summarizes those tests - especially because they might potentially made by different labs, and the fact that one of them worked might be actual a real effect caused by some other difference in the experiment.


If someone really tested those hypotheses, let them publish. I doubt they'll get funding so it'll be on their own dime. In practice people do run experiments like that, but they only publish the 1/4 trials that is successful.


Look to physics for how negative results should be published. There typically has to be reason to suspect some dosage range should work, in which case that sequence of studies you describe would be perfectly valid if it's within that range.


Why would someone want to game a negative result? Nobody ever becomes famous for saying my approach doesn't work. (As long as science is open, to make sure there is actually good work done by researchers before reaching this neg result.)


To have their name on a publication, which is a currency in the academic world.


I've thought about the idea of allowing people to separately publish data and analysis. Right now, data are only published if the analysis shows something interesting.

Improving the quality of measurements and data could be a rewarding pursuit, and could encourage the development of better experimental technique. And a good data set, even if it doesn't lead to an immediate result, might be useful in the future when combined with data that looks at a problem from another angle.

Granted, this is a little bit self serving: I opted out of an academic career, partially because I had no good research ideas. But I love creating experiments and generating data! Fortunately I found a niche at a company that makes measurement equipment. I deal with the quality of data, and the problem of replication, all day every day.


It would be interesting to consider how much knowledge would never have been uncovered if you were King of Science. All those subtle, barely seen interactions in nature that on further investigation turned out to be something rather special.


Such as? It would also be interesting to explore how many dead ends we wouldn't have wasted time on, and so what other things might have been discovered sooner.


Scientists aren't stupid. No one saw a paper where a predictor explained 1% of the variance in an outcome and based solely on a significant p value decided that was a great road to base an entire career on. The problem, as described by the parent comment, doesn't really exist in funding structures and the scientific literature. It does occur to some degree in media coverage of science.

One could make the case that in GWAS studies it has occured, but not because small effect sizes are inconsequential, the statistical methods just weren't able to separate grain from chaff for a while.

An allele that is responsible for 2% of the variation in disease risk might seem inconsequential, but 25 of those together can serve as a polygenic risk score that can predict disease and target treatment.


> Scientists aren't stupid. No one saw a paper where a predictor explained 1% of the variance in an outcome and based solely on a significant p value decided that was a great road to base an entire career on. The problem, as described by the parent comment, doesn't really exist in funding structures and the scientific literature.

Of course they're stupid. Everyone is stupid. That's why we have a "scientific method" and a formal discipline of logic to overcome fallacious reasoning and cognitive biases. If people weren't stupid we wouldn't need any of these disciplines to check our mistakes.

And yes, what you describe does happen all of the time. We literally just had a thread on HN about the failure of the amyloid hypothesis in Alzheimer's and the decades of work put wasted on it. Many researchers are still trying to push it as a legitimate therapeutic target despite every clinical trial to date failing spectacularly. As Planck said, science advances on funeral at a time.

Which isn't to say that small effect sizes aren't legitimate research targets either, but if you're after a a small effect size, the rigour should be scaled proportionally.


So your example of decades being wasted chasing an initial tiny effect size, all the time, was... An example of a failed mechanistic hypothesis that wasn't based on a tiny effect size.


I wasn't trying to post about the effect size specifically, but about general incentives and dead ends, but if you want a specific example look no further than aspirin for myocardial infarction:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/

Quote:

> A commonly cited example of this problem is the Physicians Health Study of aspirin to prevent myocardial infarction (MI).4 In more than 22 000 subjects over an average of 5 years, aspirin was associated with a reduction in MI (although not in overall cardiovascular mortality) that was highly statistically significant: P < .00001. The study was terminated early due to the conclusive evidence, and aspirin was recommended for general prevention. However, the effect size was very small: a risk difference of 0.77% with r2 = .001—an extremely small effect size. As a result of that study, many people were advised to take aspirin who would not experience benefit yet were also at risk for adverse effects. Further studies found even smaller effects, and the recommendation to use aspirin has since been modified.

Long-term aspirin use has its own risks, like GI bleeds, and the MI benefits are clearly not warranted given those risks.


It's hard to parse that example, because the citation it contains is to a meta-analysis that provides and effect size of aspirin for MI in the PHS in the form of an odds ratio that is much greater magnitude. Digging a bit more, heres the actual result - the difference in relative risk was 44% not 0.77%. https://www.nejm.org/doi/full/10.1056/NEJM198907203210301

> There was a 44 percent reduction in the risk of myocardial infarction (relative risk, 0.56; 95 percent confidence interval, 0.45 to 0.70; P<0.00001) in the aspirin group (254.8 per 100,000 per year as compared with 439.7 in the placebo group).

I agree if you said from the start you meant general incentives, especially in pharma development, but that is by and large a different conversation.


This paper was pretty clearly pre-specified here; https://files.givewell.org/files/DWDA%202009/IPA/Masks_RCT_P...


And it was actually preregistered as well: https://osf.io/vzdh6/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: