Why is Gallup oversampling Republicans? (Or are they?)

It’s not so much that they’re doing something as they’re not doing something that many of their competitors do, which is massage the raw data to match party registrations based on prior exit polls or other estimates.

Gallup’s stated reason essentially follows what Pew Research said in Vic Ferrari’s post above, so there’s the answer to the General Question portion of the thread.

I lack completely the expertise to weigh the validity of their claim, except to note that the USA Today representative’s claim that Gallup was “clearly the best” in 2002 is iincomplete by omission. In the last national election, 2000, they were second best, behind Zogby, who does do the data massage thing (though they were ahead of any number of organizations which also do it).

To add to this . . . or more accurately, to confuse the hell out of this topic . . .

It’s not posted yet on CNN.com (and I’m getting this info seconhand), but Paula Zahn apparently just announced on CNN that the latest Gallup poll has Kerry leading in Ohio by 4% among registered voters, 50-46%.

Ohio trends slightly GOP and has been in the Bush column for quite a while.

However, among Ohio likely voters, Kerry’s behind, 50-48%. Previously, he was behind by 8 points.

Again, this is all second hand, and I don’t yet have a link to the data.
And I also heard that the CNN lead was, “As you can see, President Bush maintains his lead in Ohio . . .” :rolleyes:

There was that infamous poll back in 1932 that had Hoover beating Roosevelt handily. One small problem: they only surveyed people with phones, and a lot more people went without phones because of the Depression.

I read the article, and it struck me as an excellent answer to some other question. (I’m not sure what the question would be - something about how likely a random person was to change their party ID in a short time.)

The question here is, in groups of millions of people, are the aggregate totals of the members’ party IDs reasonably stable? And Pew’s own numbers suggest that they are. In their annual surveys, involving 19,000 people each year, aggregate party affiliation in the US hasn’t changed by more than 5 points for either party since 1987. And in Presidential election years (1992, 1996, 2000, 2004; they didn’t survey for this in 1988), it comes back to the same place: 34% Dem, 28-29% GOP, the rest independent.

Like I said in a related thread, I think there’s a very good argument here for weighting by party ID. We don’t know exactly what proportion of the population is GOP and what proportion is Dem, but we have a very good estimate from a very large sample, and we know that it’s not particularly volatile.

We certainly know enough about party ID to toss out polls that have absurd party-ID numbers, such as Gallup’s 43R, 31D sample.

manny, I don’t know what you mean by ‘massaging’ the data, but to me it suggests rather informal means to make the numbers come out the way one thinks they ought to. (If I misread your meaning, my apologies up front.) Weighting to account for the differences between sample composition and known relevant population characteristics is standard operational procedure in professional statistical circles. All those income and unemployment and poverty numbers from Census and BLS and so forth? Weighted, and often in some pretty complex ways. The government statistical shops usually have the benefit of much larger samples than the political pollsters do, but our samples still don’t match what we know about the population from the decennial Census, so we have to weight our samples to conform with reality. If the pollsters aren’t doing this at least with standard demographic data such as sex, race, and age groupings, then I’d assume their results are flawed in ways that make the sampling MOE look trivial.

IIRC, that was Roosevelt-Willkie, 1940. It gets mentioned in a lot of baby-stats texts.

http://www.electoral-vote.com/

“The importance of the difference between (registered voters) and (likely voters) can be seen in a Gallup poll of Ohio conducted Sept. 25-28. Among all registered voters, Kerry is ahead 49% to 46%. Among those voters Gallup thinks are likely to vote, Bush is ahead 49% to 47%. In other words, Gallup thinks Bush will carry Ohio because large numbers of Democrats won’t bother to vote. Needless to say, both sides will strive mightily to get out the vote on election day. As an aside, this poll is the first one I have seen in many weeks showing Kerry ahead in crucial Ohio.”

Nope. Roosevelt-Landon, 1936. Literary Digest published the poll, and folded shortly after the election.

http://historymatters.gmu.edu/d/5168/

Argh! You’re right. I should have Googled before I posted.

Eh, no big deal. Us History graduates just have to pull out the chops every once in a while.

Thanks, John, for the recall. I knew it was one of Roosevelt’s elections.

The Bush crew is treating it as a very tight race, Gallup notwithstanding.

As well they should- even if they were ahead 10 points, treating a race as a ‘done deal’ has led to the downfall of several candidates, not the least of whom was Bush Sr.

This poll from just about a week before the 2000 election shows how far off Galup has been.

No worries, RT – I use “massage” in this context as a 30,000 foot, value-neutral description of exactly the kinds of stuff you describe. Unmassaged data, particularly when the dataset is something as unreliable as people, is often worthless.

If I can be permitted to “read through” AP’s statement, what I think they intend to take away is something like ‘we adjust for all kinds of stuff which doesn’t change by large amounts (remember a 5% change on a 35% base is like a ~15% change on the whole sample), like income and race and whatnot, that we think that covers us enough so that we don’t have to further adjust for party identification. In fact, since party identification might reasonably be expected to correlate with the choice of our primary question, adjusting for that particular variable might throw off our data.’

Of course, Gallup (and others) are incentivized to be cagey about the exact circumstances of these things, not only for competitive reasons but because of the risk of coming across a semi-numerate editor who asks “why is it news that there’s a 2% change in a poll with a 7% confidence interval” or, Cecil forbid, “why is it news that I have 100 polls on my desk with a 95% confidence level and 5% of them are outliers.” Either of those questions get asked, there’s a lot of pollsters who all of a sudden are talking to their kids about going to State U. instead of Harvard. :wink:

The real problem is simply this: drawing an EPSEM sample via phones has gotten extremely difficult. It isn’t just cellphones, but callerID block, screening of calls, general refusal to participate, etc. Because there is no good way to get a random sample, what you are left with is an attempt to MODEL the sample you actually can get. But this means we aren’t anymore talking about a true poll: we are talking about a long set of assumptions and equations meant to reproduce whatever you imagine the population is supposed to look like.

Clearly, oversampling Republicans is a serious, serious issue with Gallups’ polls. But polls in general have been very screwy this year. We’ve seen in MANY cases swings of up to 11% in some states happen in the complete abscence of any major news: something that is just not very likely to represent true movement. To some extent the greater influence of the internet and people’s interest in even tiny scandals and details could explain it… but it can’t explain 11 point shifts. Gallup is really not being very honest about this.

This definately hurt Gore in 2000 by the way. The polls are now largely acknowledged to be skewed toward overstating Bush support, but because the polls reported him down, this puts the media emphasis on “what’s going wrong with the Gore campaign” and “Gore struggles” and so forth. In reality, his campaign was doing really well. But the false polls gave the media the excuse to beat him up, because the “story” of the media narrative follows the percieved horse race. Kerry is being hurt by this in just the same way: though if he pulls ahead even in the skewed polls, like he is in Ohio, it could actually eventually work in his favor.

And then they’ll work for bosses who are upset when they find out that 40% of their employees’ sick days have been taken on Mondays and Fridays. :smiley:

Actually, they try to get 1010 respondents broken down 52/48 female/male, 12.5% black, 28% Hispanic, etc, etc. If they don’t get that in their 1010 completes, they score the respondents differently, giving more weight to respondents from underrepresented groups. Obviously, they prefer to do this as little as possible, though.