intention, thanks for the very interesting post. That is an interesting study to be sure. Not sure how applicable it is to the physical sciences though. Despite the fact that the author talks in general terms about “science”, all of his examples seem to relate to the sort of purely statistical studies done in medicine that are very sensitive to issues of statistical significance.
It is also interesting that you take that statement on replication to be supporting you folks at ClimateAudit since the author’s definition of replication is completely different from yours:
So, by replication, they are talking about exactly what I am talking about: Doing a similar study and finding similar results. They are very clearly not talking about taking the data generated from the first study and “auditing” it by re-doing all the statistics calculations that the first study did to be sure that the authors did them correctly. (Again, their focus is on very different sorts of studies than are done for the most part in the physical sciences, although work on temperature reconstructions, which does have a large statistical component, is at least closer to what they are talking about. So, it sounds like they would be more interested in looking at what other studies since Mann have concluded in regards to the reconstructed temperatures rather than in going over Mann’s code line-by-line to see what he did.)
How are the authors’ examples different from the sort of purely statistical studies done in climate science that are very sensitive to issues of statistical significance? The signal we are looking for is so small (hundredths of a degree per year) and the data is so poor and fragmentary, that the overwhelming majority of climate studies contain a huge statistical component – just like the studies that Ioannidis is discussing. The Hockeystick study (Mann, Baker, Hughes '98) is nothing but statistics. They did not gather data, they did no experiments, there was no field work, it was 100% statistics.
So you’ve “moved on” from the Mann study as well?
The truth is that both types of replication are necessary – replicating the original procedures exactly is as important as replicating the procedure with similar data. You seem to believe that the first step in science is to perform an entirely new study which is “similar” to the original one, to see if you get “similar” answers.
In fact, that’s the second step. The first step is to see if the original study contains any mistakes, by answering questions like:
• Did the authors use the right data?
• Did they have ex ante data selection criteria?
• Did they use the correct mathematical approach?
• Did they perform the selected mathematics correctly?
• Did they report back all findings, adverse as well as supportive?
• Did they calculate the error bars and confidence intervals for their results?
• Have they revealed all of their methods and code, so that a “similar” study can even be performed?
• Did they follow their own procedures as listed in their study?
• Does following their procedures lead to the results that they claim?
Only when we know that the first study is theoretically and practically sound, only when we can do what the authors did and get the answers that the authors got, can we move on to looking at “similar” studies. There is no point in doing “another study statistically confirming the same hypothesis” until we see if the hypothesis is statistically confirmed per the authors’ claims in the first study. In fact, we can’t even do a “similar study” to what Mann did until we know exactly what Mann did do … which was tough, since he refused to reveal how he did it …
Otherwise, we just end up with a whole string of studies, for example, that depend on bristlecone pines as temperature proxies … sound familiar?
Yes, the other bristlecone based studies are “similar” to the Hockeystick, just like you want, and yes, they give “similar” results, which seems to be your gold standard … but it is only because they are similarly flawed, and are based on similarly poor statisitics and/or similarly biased data.
Again, jshore, as with your defense of not testing the climate models, I find your defense of not auditing and exactly replicating the climate studies to be absolutely incomprehensible. What kind of “scientist” doesn’t want to test the climate models, and to verify that the climate studies are correct by replicating their results, before moving on to “similar” studies? What kind of “science” do they practice in your neck of the woods, where things don’t get tested and verified before moving on?
In the Hockeystick study, Mann made an egregious statistical error, a real n00bie blunder, because as Mann himself said, “I am not a statistician”. The result of that error is that his method “mines” for hockeysticks, finding them in purely random red noise. Unfortunately, Mann used his position in the IPCC to splash his blunder all over the world, making the Hockeystick the icon of the AGW movement.
And he got away with it precisely because you, jshore, and others like you, seem to think that it is unimportant to make sure that the first experiment was performed correctly before moving on to the next, “similar” experiment.
In reading through the posts here, there seems to be a general misunderstanding of how research actually gets done and published. Concerning the replication issue raised by intention, for better or worse, replicating the precise study that someone else has actually done isn’t going to get published. But good researchers will make sure that they can replicate, to the extent possible, those previously published results before moving on to any extension of that work. I say “to the extent possible” because it is indeed too often the case that the exact data set used by the original researchers is not itself published or made widely available. However, that can be a strength rather than a weakness. When different researchers independently reach the same general conclusion using different methods or even different data, then that lends weight to the conclusion. When that happens again and again, then the scientific community reaches a point where they say there is no more real value to more of the same - that conclusion has been generally accepted.
That’s not to say that there are not contradictory conclusions that are published. Contradictions in general are what get papers published, not suppressed. A reviewer will look at a paper that says the same old thing and recommend not publishing because it’s the same old thing. But a paper that identifies a discrepancy or a contradiction can point to flaws in the generally accepted understanding or a previously unconsidered mechanism. Those are the interesting ones, and it’s no surprise that the climate literature is full of them. But these contradictions and discrepancies tend to be in the higher-order details, not in the primary theory. This is the type of debate that we see in the press about whether global warming has led to a greater number or intensity (or both) of tropical storms. There is still a lot of uncertainty about this particular issue, but much, much less uncertainty about the question of anthropogenically-driven global climate change. Uncertainty about the details does not imply uncertainty about the primary issue.
There were also questions raised about whether two peer-reviewed, published studies on the same topic could reach different and contradictory conclusions. Of course that can happen, but it does not mean that one of the two papers was based on falsified data. Different methods are often used to evaluate the available data, and there are always assumptions that are made that may be wrong, leading to different results. There are also instances where researchers reach conclusions that are broader than the data would support, even though the results can make it through the review process (as well described by jshore). In these cases, advocates tend to jump on published conclusions that support their position, either by pointing to the conclusion or to the fact that the conclusions are in error. But it needs to be pointed out that poor research is a far cry from falsification.
Regarding the discussion about why scientists might do this, people need to understand that the real currency of a researcher is reputation, not money. Although reputation can be tied to grant funding, it’s more closely tied to being right and being in the lead. A researcher that could conclusively demonstrate that global warming was a completely natural phenomenon that had nothing to do with emissions of greenhouse gases would seal his reputation as a great scientist for the ages. Even if old-timers would find it hard to publish something that contradicted their entire life’s work prior to that finding, the scientific community is full of up-and-coming researchers whose career goal is to make a name for themselves. In the completely open information exchange we have these days, there is no way that results of that magnitude could be suppressed.
The fact that they did not gather data or do experiments is irrelevant. I am talking about the basis of their studies. In medicine, studies are purely statistical…e.g., if you are trying to figure out if there is a correlation between heart attacks and several factors like obesity, lack of exercise, high fat diet, …, you do a study and look whether correlations exist.
As near as I understand it without investing a large amount of time, one point of that paper the Wall Street Journal wrote about is that if you actually look at 20 possible different factors, then purely by chance an average of one of them will correlated with heart attacks at a 95% confidence level. I don’t see how such concerns apply to the climate science field.
In practice, science rarely proceeds this way. To some degree, a referee will consider some of these questions but certainly not at the level of detail that you are talking about. And, if two studies are in stark conflict, then some people (perhaps the authors of one or the other of the studies) may be motivated to go back and try to look in somewhat more detail at these issues.
So, what you are proposing is essentially a change in scientific procedure.
No, he just refused to provide his computer code, just as most scientists do not release their code.
Well, it seems like your conclusion that everyone is doing things wrong is largely because they are getting a result that you don’t like.
This is not too surprising since I would imagine that Mann actually did it a few different ways and then wrote up for publication the one that he felt was best (perhaps because it seemed most formally rigorous). He is not the first one to do pathbreaking work that contains some errors but nonetheless gets pretty much the same answer one gets if one does it without the questionable methods.
I admit that there are still some unresolved issues in regards to temperature reconstructions, e.g., regarding the proxy data and the like. Such reconstructions are difficult and imperfect to be sure, as the NAS report notes, e.g., regarding the strong dependence “on data from the Great Basin region in the western United States.” Issues of the dependence of the result on certain aspects of the data were addressed by Mann et al. themselves in this paper, one year after their Nature paper was published.
No. I go by the facts. The NAS panel was clear on some things. For one, they agreed that the Hockeystick was dependent in its entirety on one dataset, the strip-bark (bristlecone) pines. And further, the NAS panel agreed that strip-bark (bristlecone) pines should not be used in tree ring paleoclimate reconstructions.
But then they went ahead and showed other reconstructions in their report that were similar to the Hockeystick that depended on the very datasets whose use they had just condemned (strip-bark/bristlecone pines). Some of the reconstructions they showed actually used the exact Mann PC1 data that they agreed was the cause of the “hockeystick” shape.
This kind of politically inspired “compromise science” is another reason why I don’t trust things like the NAS panel … the report was almost clinically schizophrenic, condemning something with one hand and praising the same thing with the other. For example, the Panel recommended that the Durbin-Watson statistic be used to gauge the validity of the various reconstructions … then approvingly spoke of reconstructions that failed that very test. I can give more examples of this in their report, there’s plenty of them.
So no, I make my decisions based on science. Sometimes I don’t like the results, but if they’re valid, they’re valid, regardless of my likes and dislikes.
Regarding the reconstructions that supposedly “replicate” the Hockeystick, I could go into almost endless detail regarding the exact mistakes of proxy selection, incorrect choice of statistics, lack of ex ante methods, failure to use a verification period, use of “grey” data versions, lack of archiving of data, minimization of error bars, and a host of other very technical issues if you’d like, but the short version is that the reconstructions that claim to “replicate” Mann’s work are riddled with inaccuracies.
Please get your facts straight before making this kind of accusation. Your response made absolutely no attempt to see if what I said was true. It was simply a distasteful personal attack, with no citations and no data, claiming that I made up my mind based on what I “like” rather than on the facts. I can assure you that I have spent hundreds of hours researching and writing about this very issue.
Surely at this point you know me better than that, jshore. I do my homework before making any claims … and I would encourage you to do the same.
I’ve explained exactly why in previous threads on this topic, and I know that you read those explanations. Nonetheless I am glad to see that you accepted this criticism as valid in your own words .
The last few centurys’ warming event is a self-selected sample. We are studying it in-depth purely and entirely because it was so dramatic that it drew itself to our attention. It is no different to all those reports of cancer clusters in schools or office buildings, or the similarities between the lives of Kennedy and Lincoln, or any of the billions of other self-selected samples that fill the tabloids.
The problem with self-selected samples of this sort is that we have absolutely no idea how large our sample space is. We noticed global temperatures because they are undergoing a dramatic change. But if it hadn’t been global temperatures then it could have been encroachment of woody plants into grasslands, or the sex ratios of crocodiles, or the incidence of hurricanes, or the frequency of La Nina events or the incidence of red tides or the severity of frost damage on snowpeas. It could have been any of those or literally thousands of other possible factors that have been or mechanistically could be attributed to changes in carbon dioxide levels.
Which brings us back to what you just said: if you actually look at 20 possible different factors, then purely by chance an average of one of them will correlated with increases in carbon dioxide at a 95% confidence level.
Yet our sample space is far, far larger than just 20 different factors. With just a little thought I could make a list of literally thousands of factors that plausibly could be or actually have been mechanistically blamed on increases in CO2 levels.
The only reason that we have spect so much time investigating the correlation between CO2 and temparature is because we already knew that temperatures were increasing steadily. But if temperatures hadn’t been increasing steadily then we wouldn’t have investigated it to the extent that we have. And if the sex ratios of crocodiles or the incidence of hurricanes had been ioncraesing steadily we woudl have spent much more time invetsigating those factors and, surprise surprise, those factors would have been 95% correlated with changes in atmospheric CO2.
This is the problem with self-selected samples of this type. Our sample space is infinitely large, far higher than 20 samples, yet you yourself admit that if you actually look at just 20 possible different factors, then purely by chance an average of one of them will correlated with increases in carbon dioxide at a 95% confidence level. In this reality, at this time, the factor that happens to correlate is rising temperature. In another reality it could have been the increase in woody plant density in grasslands or the sex ratios of alligators, but we dont get to see these. We only see the factor that, purely by chance, correlates in this reality.
And as you yourself admitted, this is just a correltaion, It isn;t cause and efect, it;s pure chance correlation because if you look at just 20 possible different factors, then purely by chance an average of one of them will correlated with increases in carbon dioxide at a 95% confidence level. In this it’s temperture that correlates.
And this is why the science needs to be exceptionally rigorous, and this is why anything less than a 95% correlation is meaningless.
And this is why the article that Intention quoted is highly relevant to the debate at hand.
And this is why it is disappointing that such a vocal proponent of the correlation doesn’t see how such concerns apply to the climate science field.
Blake has replied to this far more clearly than I could, many thanks. I’d just like to add a couple of points.
One is that a central tenet of science involves the calculation of standard errors, or confidence intervals. This is intimately related to the “one chance in 20” that jshore and Blake spoke of. Far too many “scientific” climate studies either ignore these entirely, or underestimate them greatly. Mann’s Hockeystick is a perfect example, as his error bars can be shown mathematically to be far too narrow.
The other is that much of climate science involves what are called “mathematical transformations”. In these, a dataset of some kind is subjected to a variety of mathematical operations, which results in a new “transformed” dataset.
Some examples will help understand the concept. We take a dataset consisting of the strength of microwaves as seen by a satellite, and transform them into a temperature record of the atmosphere.
We take a dataset of tree rings, and transform them into an estimate of historical droughts and rainy periods.
We take a dataset of ground station temperatures, and transform them into a gridded global temperature dataset.
Now, before these mathematical transformations can be accepted scientifically, we need to be able to examine:
The original data.
The exact mathematical operations performed.
The final result.
Otherwise, we have no way of knowing whether the study has any validity.
Unfortunately, in climate science there have been far too many examples of scientists refusing to reveal some or all of the data and the operations. Many of the paleoclimate reconstructions depend on data which has never been archived. Thompson’s Guliya ice-core data, for example, is a staple of the reconstructions which show no Medieval Warm Period … but Thompson has refused to archive the data. Similarly, Phil Jones has refused to reveal which temperature stations he used for the HadCRUT3 data set, despite my Freedom of Information Act request for the data. And this is just the tip of the iceberg.
This blind acceptance of some scientist’s claim that “X allows me to reconstruct a thousand years of climate” reached its peak with the Hockeystick. Until Steve McIntyre tried to actually unravel what Mann had done, it was accepted world-wide as good science … despite the fact that it was fatally flawed, as the NAS agreed, by the facts that:
• Early segments of the MBH reconstruction fail verification significance tests, a finding later confirmed by Wahl and Ammann and accepted by the NAS Panel.
• Far from being “robust” to the presence or absence of all dendroclimatic indicators as Mann had claimed, McIntyre showed that results vanished just by removing the controversial bristlecones, a result also confirmed by Wahl and Ammann and noted by the NAS Panel.
• McIntyre showed that the PC method yielded biased trends because it was calculated incorrectly, an effect confirmed by the NAS and Wegman panels.
• McIntyre showed that pivotal PC1 was not a valid temperature proxy due to non-climatic contamination in the dominant-weighted proxies (bristlecones, foxtails). Here again the NAS panel concurred, saying that strip-bark bristlecones should not be used in climate reconstructions.
Mann fought like crazy to avoid revealing any of this, because he knew what he had done. He knew that he had calculated the R^2 statistic, and that it showed that his results were not significant, so he claimed that he had never calculated it … but when his code was revealed, there it was, he had calculated it. He knew that the results were not robust, he had calculated that as well, and put the unwanted results in a folder called “CENSORED” …
And that kind of thing is why transparency is so important in science. Without transparency, anyone can make any kind of mistaken, false, or fraudulent claim and never have it come to light. Now, most climate scientists are not involved in fraudulent science like that. But most climate scientists are also not trained statisticians. They often use what they call “novel” statistical methods to obtain their results. Again, without transparency of data and procedures, these “novel” methods cannot be subjected to a proper statistical analysis.
Finally, to close the circle, many of the methods used to assign the 95% confidence intervals are simply incorrect. Climate statistics (temporal records of rainfall, humidity, temperature, etc.) are known to be non-gaussian, non-stationary, and subject to both short- and long-range autocorrelation. Because of this, normal statistical methods do not apply to these datasets, special methods must be used. Far too many climate scientists either do not know this or ignore this. As a result, their error estimates can be off by orders of magnitude.
The tragic reality is that climate science has become so politicized that we cannot trust what anyone says, on either side of the discussion. Whether through misunderstanding, mistake, mischance, or mischief, far too many climate studies are fatally flawed. That is why transparency is not an option, but a requirement, if we are ever to unravel this most complex question that we call climate.
I think your discussion of the NAS report is quite revealing. It seems to me that when you see something that doesn’t make sense to you, you quite quickly jump to the conclusion that you are right and they are wrong. So, you conclude that the NAS report is “almost clinicially schizophrenic”. In fact, I (and presumably the authors of the report themselves) see it much differently than you. For example,
I don’t think that they say that the Mann method caused the hockeystick shape. What they say is what I quoted above, namely that the method used by Mann is not a recommended one because one can create artificial data sets where it behaves badly but that, in actual fact, it turned out that it “does not appear to unduly influence reconstructions of hemispheric mean temperature”. Also, with respect to certain kinds of proxies, yes, they see some potential problems with the strip-bark pines, but that does not mean that they think any reconstruction that contains them is garbage. At any give time in any scientific field, there are “issues”, i.e., things that one can identify as most in need of improvement. However, that does not mean that every result in the field is garbage. As practicing scientists, the members of the NAS panel know that it is not black-and-white, either-or. So, while they feel that these issues limit their confidence in conclusions regarding temperatures before 1600 A.D., they do not argue that we know nothing whatsoever about this.
Sorry…but I think everyone has their biases. I do and you do. And, in fact, in my discussions with you, I think it has been quite clear to me anyway that your biases strongly influence your opinions about certain papers in the literature, some (like that paper by global temperature by McKitrick) that you seem to accept quite uncritically and others which you go through with a fine-toothed comb or refuse to even take the time to understand the arguments in (like the Santer et al. paper about the discrepancies between temperature datasets in the tropics).
That is why one can’t trust the conclusions of any one scientist, including yourself, and must look at the general view in the field.
I am sorry that you took offense to what I said. I didn’t mean to imply that you haven’t done your homework. I think it is clear to all of us that you have spent a great deal of time investigating stuff in this field. However, this doesn’t mean you do not have your own strong biases that influence your conclusions. I think this might be further magnified by the fact that, unlike many of us scientists, you have (as I understand it) received your scientific training in this one field where you and others (on both sides) have very strong biases and prejudices. I.e., you haven’t had the experience of doing science in a field where you weren’t in the position of strongly advocating for one side or the other in a pretty polarized debate.
I do admire the intelligence and diligence that you bring to your studying of these issues in climate science. However, please understand that I do remain skeptical when you reach conclusions quite at odds with most of the other scientists working in the field.
A good deal of what you write in this post makes sense. Some of it doesn’t. What do you mean by “self-selected”? Usually when we say that in the social sciences, we mean that there was some mechanism by which the respondents in the data set identified themselves. A clinical sample, people responding to an ad, people showing up to vote, all of those will differ from the population as a whole because they did something to select themselves.
How is it that our global temperatures have distinguished themselves from other global temperatures? They are what they are. If they’ve gotten hotter, there must be some mechanism, but to say that they are self-selected suggests some sort of agency on the part of temperature, and some host of other, contemporaneous, temperatures that they’ve distinguished themselves from (other than by simply changing over time).
True, but I guess I don’t see this as relevant to your point. It sounds like you are pointing out a large number of possible other domains to consider for concurrent changes, but this doesn’t get at your underlying point about the meaning of the 95% confidence interval.
This is why there are corrections to be employed for multiple comparisions. Even if you look at one comparison in one data set, adopting the alpha level of .05 means that you are accepting an error rate of 5 in 100. Put another way, you’re saying that you’d expect to see differences of a given size in variance between groups arise simply by chance (rather than by the mechanism of the study) fewer than 5 times out of 100. That’s why replication with different data sets is important. Once you show signficant differences on a given factor in two different data sets, your confidence in the robustness of the relationship goes up.
I don’t understand this, or it’s relevance here. Are you saying that the multiple comparisons concern holds for the theoretical number of comparisons that could be made? That can’t be it, becuase that doesn’t make sense.
[quote]
The only reason that we have spect so much time investigating the correlation between CO2 and temparature is because we already knew that temperatures were increasing steadily. But if temperatures hadn’t been increasing steadily then we wouldn’t have investigated it to the extent that we have.
[quote]
Yes. It’s an observed phenomenon. That’s irrelevant to the issue of multiple comparisons or the acceptance of a false positive rate of 5 times in 100. That’s like saying that there is something suspect about studying what influences depression, or crime rates, or cancer deaths, just because your attention was drawn to them. What else is science about but trying to explain observed phenomena?
Again, it means that using the statistical distributions we do, we would expect observed differences of a given number or greater to arise purely by chance 5 times in 100. Meaning that any single given difference could fall in that five percent range, and that if you run 20 comparisons within your data set, you are increasing the risk of calling something signficantly different due to your explanatory variables when it was really due to chance. This does not mean that the multiple comparisons problem extends across data sets.
Hopefully this is a typo, because there is no such standard as accepting only a 95% correlation (whatever a 95% correlation would even mean). I’m hard pressed, however, to figure out what you might have meant otherwise.
Hopefully you can clarify some of this, because as it stands, it is pretty confusing, and seemingly erroneous.
These two paragraphs (and the rest of what follows in your post) are completely wrong on the history of the theory. That is not how it happened at all. Global temperatures weren’t self-selected because they did something dramatic. In fact, when Arrhenius first calculated what sort of warming a doubling of CO2 might cause, it was a theoretical exercise, not one based on any belief that such a rise had yet even begun to occur…but that it might eventually if we kept burning fossil fuels. And, when James Hansen was before Congress in 1988 and argued that the signal due to greenhouse warming had emerged from the noise, many scientists were dubious. After all, the temperature…after having risen during the first part of the century had remained steady or even dropped a bit during the middle part of the century and had only started rising again in the 1970s. Many scientists at that time didn’t yet think it was a significant trend at all. And, of course, on the basis of the AGW theory, Hansen made the prediction that this rise would continue, a prediction that turned out to be correct.
Your whole post entirely neglects the fact that the evidence for AGW is not primarily simple statistical correlation of the sort considered, e.g., in the medical sciences. It is not searching in the dark for correlations. Rather, it is based on mechanistic understanding. The evidence from temperature reconstructions of the last millenia or so is but one piece of evidence (and certainly the piece that is most circumstantial and statistical in nature, as well as suffering from real data quality issues as intention has correctly noted even if overstating); however, even for it, I think you are on pretty weak ground to be arguing that people could have found any number of other factors to focus on as anomalous. Global temperature was the obvious factor to focus on based on our mechanistic understanding of what CO2 ought to do. Now that climate models have advanced to the point where more complex questions can be considered, there is more emphasis on starting to investigate what other effects this will have on climate, e.g., extreme events like droughts and floods and heatwaves and hurricanes. However, the emphasis was initially focussed on global temperature because the understanding of the greenhouse effect made it the obvious thing to consider.
Before this drops away entirely, I’d just like to say that I was hoping Blake or intention could come back to respond to my post and help to clear up some of my confusion about how the concerns about multiple comparisons really are problematic here.
Just a note about intention’s issue of transformations - what I typically think of when the terms “transformation” is used in regards to data is just a fairly simple and straightforward transformation to the data itself. Taking the square, log or square root, or adding a constant - such transformations are common, and are not at all a threat to the integrity of the analyses. They can complicate the interpretation of regression parameters, perhaps, but there is nothing unsound about them. They aren’t gaming the analyses in any way.
The examples he gives sound more like operationalizations. Like saying, “we’ll call temperature the amount indicated by the height of a column of mercury.” Whether any concerns arise from the examples he gives is not my domain, but I just wanted to clarify any confusion that might arise associated with what I would think of as mathematical transformations.
Hentor, thanks for the post. To answer the last question first, “mathematical transformation” is a general term for transforming one dataset into another, using one or more of a huge variety of mathematical operations (linear algebra, matrix algebra, logarithms, Kalman filtering, wavelets, differencing, etc. ad infinitum). A quick Google search, for example, finds the term applied in ways such as
Note that this is exactly the sense in which I used it, that of mathematically transforming a dataset in one domain (tonometry, remote sensing data, blood pressure, tree ring width) into another domain (Heart sounds, central aortic pressure, temperature, vegetation change).
While “operationalization” has a similar meaning, it seems to be used mostly in the social sciences. “Transformation”, on the other hand, is used in signal processing. This is what we are doing in climate science, as evidenced by the use of the term above regarding satellite data and vegetation growth. I am using the more common term in the context of climate science when I refer to this as “transformation”.
Next, you said that you wished Blake or I could “clear up some of [your] confusion about how the concerns about multiple comparisons really are problematic here.”
I fear that I am the one who is confused now. Perhaps you could restate your concerns, so that I could answer them directly.
Thanks, Hentor, that makes it much clearer. Regarding your questions:
Let’s take a real example, the transformation of tree rings to global temperature. In general, the transformation procedure is:
a) Select some tree rings to use as proxies.
b) Compare the tree ring data to historical temperature data during part (the “correlation period”) of the overlap period of the two datasets, and define a mathematical transformation of tree ring width to temperature.
c) Use the transformation to hindcast the temperature during the other unused part of the overlap period (the “verification period”), and determine if the transformation gives a statistically significant result.
d) Use the verified results to estimate the reconstructed temperature for historical periods for which we have no temperature records.
e) Estimate the errors in the historical period.
As you can see, there are large statistical issues in many parts of this, including:
a) How well does the reconstruction match the temperature during the verification period?
b) How well does the reconstruction need to match the verification temperature in order to be considered valid?
c) Does the tree ring reconstruction temperature need to match the local temperature as well as the global temperature to be considered valid, and if so, how well?
d) How many tree ring datasets are needed to define a global temperature?
e) What is the estimated correlation coefficient (R^2) for the historical reconstruction?
f) What are the error estimates for the various historical periods of the reconstruction?
So yes, transformations can involve very important statistical issues, considerations and questions.
Blake can correct me if I am wrong, but I believe he is saying that out of the hundreds of climate variables (temperature, humidity, tropospheric lapse rates, etc.), scientists have self-selected one of them.
Type I error is a “false positive”, meaning that we falsely think that something is significant when actually it occurs by chance. This type of error is very relevant to climate science. The null hypothesis is that a given change in climate records occurs by chance, and a false positive means that we assume that the change is actually due to some external factor, oh, say, change in CO2 …
The chance for false positives in climate studies is greatly increased because of the general autocorrelation of climate records. To quote from here:
Note that many of these problems involve Type I errors, and that they can be very large if autocorrelation is large. Temperature datasets typically have an alpha (lag 1 correlation) of about 0.8, leading to a huge chance of Type I errors.
Again Blake can correct me if I am wrong, but I believe he is talking about a 95% confidence interval (p<0.05).
This is an arbitrary level. It means that the odds of the findings occurring due to random fluctuations in the data (false positive, or Type I error) are less than one in twenty (5%). This level is commonly used in scientific studies, but it depends on the consequences of false positives. If a false positive would make a large difference, a 99% confidence interval is sometimes used. The IPCC, on the other hand, uses a 90% cutoff.
Can you tell me how a scientist could “other-select” something? The way you and Blake are using “self” makes absolutely no sense.
Thanks for the didactics. Perhaps I didn’t make clear that I’m quite familiar with this.
If you are using the incorrect analytic procedures, but why would you? Is there some reason why researchers wouldn’t use techniques to account for correlated observations?
Sure, except that he didn’t say that. Which is one of the things I asked for clarification of.
Okay, now it isn’t clear if this is being didactic or pedantic. This is irrelevant to the “95% correlation” that Blake brought up.
Good scientists use something called “ex ante” criteria. This means that you select some criteria for the phenomena or items of interest, and then you look at the particular instances that fit those criteria. “Ex ante” means that you select the criteria first. An example might make things clearer.
Out of the hundreds of ways to examine the climate, scientists are mainly looking at the reasons for the recent (20 - 30 year) rise in temperatures. That is self selection.
On the other hand, your ex ante criteria could be that you are interested in any 30 year temperature warming trends in the HadCRUT3 1850-2005 dataset that are statistically warmer than the rest of the historical trends … except, oops, the current 30 trend fails that test.
Or you could look at the 20 year trends using the same criteria … except the recent trend fails that test too. Both the recent 20 year and 30 year trends are not statistically different from the corresponding trends leading up to the 1940’s peak in warmth. A scientist wiser than I once said “Before we waste too much time trying to explain the nature of a phenomenon we should first confirm that the phenomenon exists.” Using ex ante criteria, rather than self-selecting the phenomenon, helps us to avoid that error. Before we run off to find an explanation for the recent warming, it lets us know that the recent warming trend is not statistically remarkable in any way.
Here’s a second example. Michael Mann selected a number of tree ring and other proxies for his famous “Hockeystick”. However, he left out a number of other equally valid proxies. Why? Because rather than using “ex ante” criteria for proxy selection, he simply self-selected them. His chosen proxies made a “hockeystick” … and the other proxies didn’t. Coincidence? You be the judge. But whether the selection was accidental or deliberate, using “ex ante” criteria in place of self-selection avoids this kind of bias.
I figured you were familiar with this, Hentor, but we’re fighting ignorance here, and I’m also sure that everyone following the thread isn’t familiar with this. I prefer to keep my explanations as general as possible. Statistics is a daunting subject, so I try to explain it as I go along so even non-mathematicians can follow the discussion.
It didn’t seem that you were familiar with why this is an issue in climate science, though … which is why I went on to give you the example that you haven’t commented on. This was not a theoretical example, it brought up issues that have been mis-handled in a number of tree-ring paleoclimate studies, from the Hockeystick right up to the present.
The main reason seems to be that climate scientists don’t know much about statistics. You’d be amazed at the number of climate science papers which don’t make any allowance for autocorrelation.
A second reason is that the exact statistical procedures to use in a given part of a fairly complex transformation, like tree rings to temperature, are not always clear or well defined.
The third reason is simple ignorance. Take a look here for a particularly egregious example of statistical nonsense, this one from the IPCC itself.
Guess I’ll have to let Blake answer these last two questions, then.
So you use the terms ex ante and self-selection interchangably? How truly bizzare! Let me make things clearer for you. Self-selection refers to a process by which individuals might bring themselves to the attention of the researchers, meaning that there is something going on that must be accounted for or at least acknowledged by the researcher. Here is an explanatory wikipedia link right back atcha. I’d recommend striving for more precision in your use of terms. It certainly can get a bit confusing.
It is especially confusing when you start suggesting that taking note of and studying some phenomenon is questionable because you “self-selected” it. To be sure, the potential problem of unmeasured explanatory variables is ever-present, but it would be far more erroneous to leave out a clearly related factor from the analysis than to include it.
Hentor, mea culpa, you are right about “self-selection”, I was using the term incorrectly. You are 100% on target.
My related points about “ex ante” selection, while correct, do not have to do with to the question of self-selection. jshore was using it incorrectly also, saying e.g. “Global temperatures weren’t self-selected because they did something dramatic.” Ah, well, live and learn, the fight against ignorance continues on all fronts.
Having disposed of that question, perhaps you could comment on the other issue, where you said “Any constant transformation (regardless of how many you want to list) is not a statistical problem, so I don’t see the concern.” I have provided a variety of citations and examples showing that transformations involve major and very important statistical problems in climate science, and you have not responded.
Sorry to have not caught this question before. I don’t understand your concern. Transformations are legitimate techniques to deal with some conditions within the data. Most typically, they are used to bring the distribution into a shape for which one can use normal theory statistical techniques. For inferential statistics, if you transform the data essentially so that the relative positions of the observations are not themselves jumbled, there is no concern about the legitimate application of inferential statistics. Your parameter values (betas) will change, and your interpretation of the values may change, since you may have changed the scale of the original data, but the statistical tests to determine significance will still be the same.
Now, I don’t know the climate research very well at all to be able to comment on that aspect of things. Do you have examples that clearly demonstrate people making transformations of the data that do shift the relative positions of the observations?