It’s often the case that people will demand a more rigorous standard of proof when the scientific issue being debated is one of a highly emotional, sensitive or potentially offensive nature.
For instance, if the issue is about whether white people have larger fingernails than black people - a reasonably unoffensive topic - and if we’re ranking academic rigor and research-study exhaustiveness on a 1 to 10 scale, with 1 being lax and 10 being as thoroughly exhaustive a scientific study as it can possibly be - then most people would consider, say, an 8.5 to be acceptably rigorous enough.
But if the topic is of a much more potentially offensive nature - like, gender and IQ - then suddenly many of those same people will argue that an “8.5” isn’t good enough. The scientific study must meet a “9.5,” maybe even a “10,” before it can pass muster.
Now, is this scientifically logical, though? The math is the same and the equations and formulas are the same. The main difference is that one is a reasonably unoffensive issue whereas the other issue really rankles people. But from a scientific perspective, shouldn’t facts be neutral - in the sense that whether something is politically offensive or not should not change the methodology or the standards demanded?
What keeps people from accepting a reasonable standard of evidence is bias, i.e. prior belief and ideology and the amount of investment (whether emotional, psychological, financial, or some combination thereof). Science should and does continue on its merry way questioning everything, including its own prior conclusions.
While I can understand what the OP is getting at…I don’t think that it is hypocritical or something for “facts” that have serious implications that should “need” to be “proved” to a higher standard than facts that do not have serious implications.
I’m not that vested in the idea that my shed has some shoddy engineering standards applied to it. I expect a bit better for my local nuclear power plant.
In addition, I think the statistical terms scientists use to describe the outcomes of tests are misunderstood by lay people as meaning something other than what they mean, and that this has a bearing on this discussion. When a test produces a p-value of .01, for example, that does not mean - as many people assume - that there’s a 99% likelihood that [whatever] is correct/incorrect. It just means that there’s a 1% chance that random fluctuation alone would have produced the results of this test. As to whether [whatever] actually is or is not true, and by implication whether this is one of the 1% of times when random fluctuation has produced a false positive depends on other evidence including the a priori likelihood of the results being true or not.
Now this doesn’t have a direct bearing on whether something is sensitive or not. But to the extent that someone feels on other grounds that such-and-such result is highly unlikely, then it’s logically valid to reject the same level of statistical power as might be accepted in another instance, and this is where I think there’s some overlap.
Your measure of “academic rigor and research-study exhaustiveness on a 1 to 10 scale” is such a trivial and puerile metric that it doesn’t even make sense. We don’t measure the validity of scientific conclusions based on how much effort was put forth to research the matter in question on some arbitrary scale. In the end, the validity of a conclusion to a study depends on the representativeness of the sample population, the clarity of the statement of hypothesis, the specificity to which design/research parameters are defined, and the certainty of the result stated in terms of probability and confidence level. When a researcher starts with the premise that their sample represents a comprehensive representation of the entire spectrum, and especially when they start from a small group of people from one narrow cohort which is expected to represent an entire gender, age group, or culturally-defined racial category, it is entirely appropriate to be critical of just how well the essential premise of the study is representative of the population as a whole, especially when cultural and developmental influences are dismissed out of hand or assumed to be renormalized.
Qualification of discovery of the Higgs boson–one of the major scientific discoveries of the early 21st century–was held to a requirement of five standard deviations (P>99.9999%) at a 95% confidence level before the researchers at CERN would even announce a discovery (and if you pay attention, they still couch the discovery in terms of probability and confidence). Although it is an important scientific endeavor, discovery of the Higgs boson affects almost no one directly; it doesn’t govern education policy, or determine public funding, or will ever serve as a basis for poorly founded bigotry. On the other hand, eighty years ago the German Reichstag drafted a set of carefully drafted, intricately detailed laws defining the rights and legal status of supposed racial groups ( Jews, Roma, and blacks) based on the pseudo-science of eugenics that ended up resulting in the internment, forced sterilization, and deaths of upward of three million people in government-authorized death camps alone.
So, yes, it is entirely reasonable to hold research which asserts overarching conclusions about broad groupings which may have far-reaching impact up to a very high standard of statistical rigor and critical review, especially when the research is based on the risible premise that “women”, “blacks”, “Asians”, or “Caucasians” comprise a genetically homogeneous cohort, especially when those conclusions are based upon qualities such as intelligence that we can’t even agree on how to interpret in a restricted context or use to predict future potential.
For example, over half of psychology studies fail reproducibility tests. Link.
Next, once a problem is found it appears it is quite hard to get it fixed. Link.
I think peer review was an awesome idea. However, I think that we need a different approach as peer review has become less about ensuring the science is correct and more about getting the cred needed for career advancement (I published 600 peer reviewed articles! Can I have a job? How 'bout a grant?). I am not the only one apparently:
Sydney Brenner, who won a Nobelfor Physiology or Medicine in 2002 says:
For a concrete example, there is Ocorrafoo Cobange. Who is that? Well it is a fake name a guy made up. He then submitted fake papers to 300+ journals. 157 accepted the papers. Linky.
There seems to be a reasonable portion of the population who believe that if a scientist said it, it must be true. Further more, if a scientist said it in a paper and someone reviewed the paper, it must be absolutely true.
Reality doesn’t show that.
It appears, from the outside, that a few things are interfering with the process. One is prestige. The second is volume. I also suspect, that in some areas, pal review is an issue as well.
On the prestige front, the writers of papers obviously have a motive to get more papers published. The journals do as well. That leaves the reviewers as the last line of defense. If they take their time and do a good job, I suspect things can still work. However, if they don’t take the time, or assume that another reviewer will catch any errors, things can and will go south. Not sure how to fix this other than to make publishing an invalid paper painful enough that people won’t want to risk it.
On volume, I think that if the peer review process is tightened up, this will solve itself because people won’t be able to skate by on shaky research.
On the pal review front, I suspect that this is a problem in areas with low numbers of people in the field. The way around this would be to have people in a closely related field do the review. Also ensure that the reviewers do not have a relationship with the author(s) of the paper in anyway.
So, I think that ALL research needs to be vetted in a better way. It seems pretty clear that peer review isn’t working well right now. There may be ways to reform it but, from the outside looking in, I have no clue was to what they are.
Note, I think that in the hard sciences this is less of a problem because the results are much easier to duplicate. In softer sciences, like psychology, not so much.
There are plenty of published, peer-reviewed research papers out there about racial and gender disparities with regard to certain cognitive metrics and aptitude tests.
What (intelligent) Dopers challenge is the belief that we know enough to say these disparities are attributed to genetics more than environmental influences. A paper that just looks at outcomes (e.g., “white” test scores versus “black” test scores) does not address this question, even when attempts are made to control for socioeconomics. To prove that a gene or set of genes is responsible for certain traits, you’d have to do a controlled experiment using animal models.
So yeah, you’re going to need a higher level of rigor when you’re testing this kind of hypothesis (“Genes are responsible for the differences between these groups”) rather than simply documenting a pattern (“These groups are statistically different”). You only need to know elementary statistics to tackle the latter.
Sure, but that’s a policy decision, not a science decision. On the grounds of strictly scientific validity, socially sensitive research topics do not require a higher standard of scientific certainty than any other research topics.
This thing about a fingernail study is such BS. First of all, we have a pretty good definition of what a fingernail is. And we have a pretty good definition of things like area, surface area, etc. And we have pretty good tools to measure them. None of which are the case for a concept like “intelligence.”
But aside from that, if someone came out with a study that discussed fingernail differences between white and black people, we’re all going to start asking the same questions. How did you define “white” and “black” people? How did you select your test subjects? Which metrics are you looking at when you say “bigger” or “smaller?” Did you do any sort of genetic analysis with your test subjects? Did you account for environmental factors? We have to ask all these questions so we can figure out exactly what is being studied.
We’re not holding intelligence studies to some higher threshold than fingernail studies. We’re holding them both to a scientific standard of rigor.
Personally? I look at literally centuries of white scientists desperately scrambling for some sort of scientific justification for their racist beliefs, from phrenology to neoteny to genetics, and I notice a certain pattern. Every single one of these racist theories has been discredited.
I look at centuries of physicists and their theories about the stuff of the cosmos. And I notice a pattern there too: physicists have discredited theories, but a lot more creditable theories, and they don’t seem to be trying to justify preconceptions to nearly the extent that race “scientists” are doing*.
I’m a layperson. I have to evaluate science using incomplete understanding, to the extent I evaluate it at all. You bet I’m going to hold the latest theory about why White People Are the Bestest up to more skepticism than I’ll hold the latest theory about gravity waves.
The general public has a poor understanding of basic statistics, so anything like a p value is not something that’s convincing. The best way to ensure rigor is probably reproduction.
Peer review can’t really determine whether the study is real, the issue was that their fake methods had serious holes.
The problem here isn’t even with peer review. All the journals were open access. This category does contain many legitimate journals (e.g. Plos ONE, who rejected it), but also contains many scam journals - they don’t have *any *peer review and will publish anything sent their way once you pay their fees. Many of these are based in China or India. Witness the classic paper (never published but it would’ve been accepted if they paid) “Get Me Off Your Fucking Mailing List” (pdf)
It has to be remarked again that John Bohannon, of Science magazine, was criticizing the open access journals problems about the way they publish science.
Sad to see that effort that discredits shoddy journals being used by some in an attempt to discredit all more properly reviewed science journals. That was not the intension nor the target of the researcher.
No. It is closer to “Poorly defined hypotheses result in unsupported conclusions.” There is nothing extraordinary in the thesis that there will be differences in the statistics of a specific capability between distinct populations, but trying to make sweeping generalizations about specific innate differences in capability between heterogeneous or vaguely defined cohorts, and especially without critical distinctions between socialization, training, and education, is foolhardy.