YAGWT - Yet Another Global Warming Thread

I gave an example before, I think you might have missed it. I said:

That was the example I was asking you to explore.

w.

Begging everyone’s pardon for jumping in 80 posts into the thread, but there’s something here I just don’t understand.

What is the transformation here that you object to? You’ve provided a very vague example, with no actual numbers that we can examine.

When I think of a statistical transformation, I think of this type of example: I have data that are heteroscedastic, and therefore do not meet the required assumptions to perform a simple Model I ANOVA. I take the logarithm of both sides of the equation, and examining the logarithms, I see that the transformed variables are homoscedastic. Now I can perform my ANOVA.

If someone objects to my results and says my transformation was suspect, we can now talk about whether the logarithm was an appropriate transformation.
Your reply to Hentor’s question:

…In a rather non-linear fashion, not actually providing an example that shifts the relative position of the observations. In fact, an example without any actual observations, just a category of observations that might be made (tree rings.)

In short, could your example become a little more specific so we can examine the transformation with which you disagree?

i.e. What is the equivalent of the logarithm that you suspect?

It’s also possible that I misunderstand you, and that you don’t have a disagreement with a statistical transformation, but instead with this:

The relationship between tree rings and temperature is governed by physical factors in the real world, and so is not necessarily the same as the type of purely statistical transformation I gave an example of above. To avoid confusion, I’ll call it the “proxy relationship” instead. Is this where you have a disagreement? If so, it’d be nice to know what specific proxy relationship between tree rings and temperature you object to.
You ask some very good questions, intention:

And shouldn’t we try to answer these questions rather than just asking them? We can do that much better if you’ll let us know which paper you’re talking about.

wevets, you need no pardon for coming in whenever you arrive, you raise good points.

The reason we started discussing the question was the claim by Hentor the Barbarian that transformations don’t involve statistics … he said:

For a specific example that illuminates many of the issues I am pointing to, we could take a look at the ill-fated “Hockeystick” paper of Mann, Bradley, and Hughes. There’s a good discussion of one of the many statistical aspects of the Mann transformation, the calculation of the confidence intervals, located here.

It appears that I am using “transformation” in perhaps a more general sense than you are. Mann, for example, starts with a dataset that is comprised of a time-series of tree ring widths, and ends up with a dataset that he says represents a time-series of Northern Hemisphere temperatures. This is the type of transformation to which I am referring. Call it a “proxy relationship” if you wish, but the subject matter of the datasets is meaningless. One real-world dataset is transformed, through a variety of mathematical operations, into another real-world dataset. This could be satellite microwave strength data transformed into atmospheric temperature data, or French grape harvest dates transformed into summer temperature data. The subject matter is not the point, nor is the particular type of tranformation used.

In Mann’s case, it is a “principal components” (eigenvectors) method which is used to transform one dataset into another. Unfortunately, he made some errors in the process … and he has not revealed his method for calculating the uncertainties in the process. But regardless of whether he did the math correctly (he didn’t) or revealed his mystery method for calculating the uncertainty (he didn’t), there are a wide variety of statistical questions involved in the whole procedure.

I have listed a number of statistical questions relating to the type of transformation done in Mann’s paper, including:

I was not looking for answers to these questions. I was merely pointing out that, contrary to Hentor’s claim, there are a number of interesting and difficult statistical questions involved in the types of transformations used in climate science. Anyone who wants to take a hard look at the majority of climate science papers and studies has to be able to understand the statistics involved. Many of the current claims of climate science are based on studies which ignore even very basic statistical principles and concepts. As my father used to say, “Son, the large print giveth, and the small print taketh away.” And in climate studies … statistics are the small print.

All the best to you,

w.

In the physical sciences we mean essentially the same thing: the data have drawn themselves to our attention rather than having been collected post hoc.

By the mechanism I outlines above: we have only spent billions of dollars investigating possible correlations with tmeperature becuse the temperature is noticably increasing. Let me ask you this: if the global temperature had been nearly perfectly stable for the past 100 years do you honestly believe we would have spent the same amount of money investigating the correlation with atmospheric conditions, pollution, aersols, ENSO and so forth?

To me the answer to that seems self evident: of course we wouldn’t have. Just as we have spent very little investigating the correlation of those factors with the sex ratios of crocodiles or the incidence of fire on the Kalahari. Properties that remain stable aren’t intensely investigated.

And that is where the problem comes in. How many times have you read of a “disease cluster” associated with some elementary school, or small town or profession? People then go out of their way to investigate the cause, and almost always it turns out to be nothing and vanished within 1 generation. That’s because its a self-selected sample. The only reason anyone investigated the causes of the disease was because a lot of people in one place got the disease.

But the problem is that disease is random and most people don’t understand what random means. Random doesn’t mean evenly distributed, it means unpredictable. Often random events will be evenly ditributed, but occassionaly they will also from distinct clusters and trends. The problem is that we notice the clusters and trends, they draw themselves to our attention, they self select.

The same appllies to global temperature. Someone didn’t set out one day to do a full analysis of global temeprature with all the dozens of factors that are needed to make the current AGW models work. Instead someone noticed temperature was rising and then set out to see what it was. But as soon as they did that they introduced a masisve potentila flaw into any future science: they were using a self-selected sample and thus had increased their smaple space infinitely.

With a >90% correlation if we look at 10 differnt factors then CO2 will always correlate with one of them. That is basic statistics. In this case the correlation is with global temperature. But what about the other psossible fatirs, they are our sample space.

For example what if global temperatures had been stable when all this kicked off in the 1980s but the number of male crocodiles being born had been increaisng dramatically? Or if the incidence of fires on the Kalahari had been declining dramatically? And so on and so forth for another 10 possible factors that coudl be attributed to CO2 levels. We cna be absolutely sure that one of those fators would have been correlated with CO2 levels to a 90% level.

And that is how our gobal temperatures are different to other possible global tempratures, they are different precisky because they were incraesing dramatically in the mid 1980s. They drew themselves to our attention. If they had not been changing then some other factor woyld have been just as storngly correlated with CO2 levels.

It is the very fact that they are are increainsg rapidly that has selcted themselves. They have formed atrend, and the human mind is desgiged to notice trends and clusters. Once we noticed that trend we went out and looke dfrom somehting to explain it, but at only 90% correlation we knwo that many trends had to correlate to rising CO2 levels. That doesn’t mean that they are causative because our sample space is far larger than 10 possibel fators.

Yes, but more importantly it is why it is absolutely vital to know your sample space.

Imagine I conduct a trial on plant growth under C02 enrichment, then my sample space is 1; the a single trial. I find that my plants produce more biomass than the last crop grown in that plot. In that case a 95% confidence level is acceptable because the chance of having got that result by chance is tiny.

Now imagine that while driving I notice a single garden plot growing in an industrial area is producing large plants. I collect data from that site and report a 90% confidence that the growth is caused by CO2 levels. Now is that result worthwhile? No of course it isn’t because you have no idea what my sample space was. I could have driven past a million sites each day growing right next door and ignored them because they were ‘normal’.

Now do you undertsand why self-selected samples are statistically dodgy? They are dodgy because we can never know what our sample space was. What we saw could very easily coincide with the factors measured simply because we unintentionally sampled just one of millions of data points.

Only if you incorporate all the datasets you examined, ie your sample space.

For example if I notice plants growing larger in two CO2 enriched environments but every day I drove past two thousand instances of no effect and three instances of shrinkage then my confidence in the robustness of the relationship doesn’t increase at all. And when I only start to collect data * because I have seen* a size increase that is exactly what I have been doing.

No, it isn’t irrelevant because it entirely defines our sample space. We only started looking for correlation after we knew of the phenomenon and, more importantly, because it existed.

No, no no.

What it is like is noticing that the people in a certain school are suffering from more cancer deaths and then going out looking for a cause for the increase based only on that school. If the cancer deaths in that school happen to correlate well to the CO2 levels you then attribute the cancer to CO2.

Can you not see how statistically invalid that methodology is? What about the school right next door where the CO2 levels are identical and people suffer no more cancer than the general population? What about the millions of other schools where there is no change in cancer rates.

With a self selected sample you have to very careful that you aren’t trying to find a cause for something that is a statistical artefact. There is no cause for the increase in cancer in the school so there is no point trying to explain it. If you restrict your study to that one school I will guarantee that you will find multiple factors that correlate to the cancer increase with >90% confidence. That doesn’t prove causation. It just proves that if you select an event because it forms a trend or cluster then you will find numeorus correlative factors at a 90% confidence level.

Religion and philosophy ar ealso about explaining observed phenomena. To be called science we need a few additions:

  1. It is about replicability. In the case of AGW we have none.
  2. It is about predictability. In the case of AGW We have
  3. It is about statistical rigor. I the case of AGW we have massive problems here.
  4. It is about logically valid argument. A soon as we start ignoring smaple space we have no logical argument.

I’ve said it before and I’ll say it again, when someone uses AGW theory to make a prediction about the real world that couldn’t be made to the same confidence by assuming a constant trend from 1860 then I will call it science. Until then it’s just not science.

Blake, I guess you must have missed my post #71 or else you would not have repeated your ahistorical nonsense or at least would have responded to what I said.

Again, as I noted, try James Hansen in the late 1980s. He made the prediction that the temperatures would continue rising at a time when many if not most scientists thought it was too early to claim that any temperature trend due to rising CO2 levels had actually emerged from the noise.

Oh, please. The temperature graph included with Hansen’s 1988 prediction shows that the temperature bottomed out in 1964, and rose thereafter. This means that the global temperature had been rising for almost a quarter century when James Hansen made his oh-so-daring prediction that temperatures would continue to rise. What rate did he predict they would rise at? Well … at the same rate that they had been rising for a quarter century …

And if the “rising CO2 levels had actually emerged from the noise” since 1988, we’d have seen it in the record. This is not the case, the trend of the post 1988 rise is statistically indistinguishable from the trend of the 1915-1945 rise, despite a much greater increase in CO2 during the recent rise.

w.

:confused: I’m trying to puzzle my way through this claptrap. How do data draw their attention to us? What agency do they have? If you are collecting them post hoc, doesn’t that mean that you have already attended to them? What is the hoc you are working post, here?

No, obviously not, because we prioritize based on perceived importance of the problem. But again, cancer receives a lot of attention, as well, and is a legitimate issue for study, right?

Only presuming that their stability does not threaten us. If levels of violence are high and stable, they are going to get a lot more attention than low but stable violence.

You seem to be confusing generalizability from a restricted sample and “self-selection.” Sure, if your sample is not representative of the entire population, then you cannot generalize your findings. Which aspect of global temperatures do you think is not representative of the entire population? Your mangled application of the term “self-selection” still is different from the issue you see as problematic, namely that of restricted samples.

I’ve always understood why both self-selected (in the typical meaning of the term) and restricted samples are dodgy. I still have no idea what you are on about vis a vis global temperatures.

No. Multiple comparisons will be problematic in each separate data set. As long as you are using inferential statistics and establishing an alpha value, you are saying “This is the level of false positives that I will accept.” It doesn’t change until you have the entire population measured. At that point, you don’t need to use inferential statistics, because the stats would be the stats, with nothing to infer.

This is just silly. We cannot know about phenomenon that don’t exist. Of course we study things that we know about, and of course we study those things we know about that might impact our lives dramatically.

In summary: Self-selection does not mean “selected by the researcher.” It means that the individuals selected themselves into the study sample.

Restricted samples are those which do not represent the population at large. This may be because of a self-selection process, or for other reasons.

It is perfectly legitimate to study something that has come to your attention, and in fact it makes zero sense to study things that you don’t know exist. You study things you know about in samples that as best as possible reflect the population. You look for replication across samples to feel more confident about the robustness of the findings.

You use inferential statistics to make estimations about the relationships you see within the sample extended out to whatever the larger population is. You accept a certain error rate, and adjust if you are making multiple comparisons that would exceed that rate.

Well, that is a rate significant enough that if maintained over a period of time causes a reasonably significant rise in temperature…and it was certainly by no means obvious to everybody that it would continue to rise. With many of the people who believe it is all the sun arguing that we are in for a cooling due to the solar cycles, it will be interesting to see what they say 20 years from now assuming that the rise has continued.

Well, as I discussed in the other current thread on global warming, it is not just the trend but the pattern of the warming and what natural forcings are or are not occurring that are important. The early century warming can be explained by known natural forcings (along with some small contribution from greenhouse gases).

jshore, thanks for your post. You say

For a discussion of “what natural forcings are or are not occurring”, and how well they are modeled by the GCMs, see The Fine Art of Fitting Elephants.

And for a discussion of a natural forcing not included in the climate models, see here.

My best to everyone,

w.

And for a discussion of a natural feedback which is neglected by the climate models, is much larger than CO2, and has a correlation of 0.63 with the 1983-2003 temperature, see here. The idea that CO2 is required to explain the earth’s post-1980 temperature history is an inconvenient lie.

w.

intention, thanks for your post. There are various problems with the cosmic ray hypothesis but, regardless of whether or not they have some influence in general, the biggest problem seems to be that cosmic rays do not show any significant overall trend in recent decades! Shaviv claims otherwise in the comments section on that post about cosmic rays but if you look at his Fig. 3, I don’t see how you come up with any significant trend…just some oscillations. (I think there have been other issues identified with the supposed correlation between low clouds and cosmic rays shown there but I don’t remember what they are. Strangely, Shaviv’s claim that there is a trend is based on a vague statement about there having been a trend in solar activity in the whole 20th century made in a paper that completely disagrees with him on this being a plausible explanation of the late 20th century warming.

If it so easy to tune a climate model to give the temperature record (as you have claimed in the past), then some of these solar / cosmic ray hypothesis folks ought to be able to do this to get a good agreement to the instrumental temperature record over its entire ~150 year history using only natural forcings. Surely they must be able to gain access to one…I think there is even some models publicly available.

At some point, the folks proposing alternative hypotheses actually have to start showing in detail how they compare to the available data. It is strange that the standard seems to be if I can cobble together any sort of plausible hypothesis then we are all supposed to “stop the presses” and can no longer accept the dominant theory which has lots and lots of supporting evidence.

See for some discussion of this abedo stuff. And, is a 0.63 correlation over a 20-year period really that impressive? I hardly think that the graph shown in your link looks very much like an (upside down) plot of [url=http://www.cru.uea.ac.uk/cru/info/warming/]the global temperatures over the same time period.

By the way, you have chided me before for looking at what you (in that case incorrectly) called a “press release” on a scientific article. In the link you provided, you have to click on a link within just to get as close as a press release on the paper! Here is the full paper; it is interesting that, while noting that these changes are something we need to better understand the origin and the effects of, they don’t make any claims that their observations are in conflict with the AGW hypothesis. Here, for example is the first paragraph of their paper:

Is a 0.63 correlation over a 20 year period “impressive”? Correlations in the world of climate don’t generally run all that high. Michael Mann’s hockeystick is based on tree rings with far worse correlation than 0.6 - 0.7. The comparable correlation of CO2 with temperature over the same period is about the same, it’s 0.60, but somehow that seems to impress you a lot. Why is the correlation of temperature with CO2 convincing to you, but not the correlation of temperature with albedo?

Your thesis has been “CO2 explains the recent warming, and the sceptics have no other hypothesis to explain it”. I provide a hypothesis (albedo changes) and you say it doesn’t explain it well enough … you’re grasping at straws. The reality is that nothing explains climate very well, that’s why the discussion continues.

I didn’t link to the full paper because it requires a subscription … my bad. I should have linked to both the article and the subscription-requiring original paper.

Are their observations “in conflict” with the AGW hypothesis? I haven’t a clue, because I don’t know what the “AGW hypothesis” is. If the AGW hypothesis is that humans cause all of the global warming through GHG increase, yes, it is in conflict. If the hypothesis is that humans cause all of the global warming through human-made changes in the albedo, it’s not in conflict at all.

This is part of the reason why so much uncertainty surrounds the subject … people keep talking about the “AGW hypothesis” as though it were a known claim, as if it were a standard, falsifiable hypothesis of the kind we are used to seeing in science.

Instead, it changes with each iteration, and seems to be something on the order of “people caused some unknown amount of the warming of the last fifty years” … hard to be “in conflict” with that, it’s mush.

Perhaps you could fight our ignorance here and spell it out for us.

  1. Exactly what did you mean by the term “AGW hypothesis” above?

  2. Exactly what did the authors of the paper you cited mean by that term?, and

  3. Exactly how are their findings about albedo not “in conflict with the AGW hypothesis”?

Many thanks,

w.