The PIPA Report: Americans on Iraq on the Eve of the Presidential Election

Thanks for giving me so much to work with, guys. Work is slow this morning, so I can give this the attention that I think it deserves.

elucidator brings up the following point:

It is not reasonable to infer the same unless you already have a strong prior belief of the same. If you believe that Bush voters are better educated, make more money, and are more intelligent, you would be more likely to doubt this survey on methodological grounds. Things always seem reasonable when they confirm your beliefs.

As far as I am concerned, if it is not demonstrable scientifically, it is not reasonable.

I know you are just being amusing with your characterization of the orthodox church of statistics. However, I find this sort of thing unhelpful. The consequences of building a bridge with faulty equations are obvious and terrible. It is not because engineers are privy to some secret knowledge and arbitrary methodology: they build bridges a particular way because they work. The same is true with statistics. They are done in particular ways because the conclusions can be proven analytically. There is also not a single Orthodox Institution: a Bayesian statistician might come along and have some big issues with my interpretation.

However, he would agree nevertheless that this study is crap.

Mtgman somehow things I am arguing against the utility of surveys in general, and that I do not believe that inferences can be made using representative samples.

Yes. However, tools are necessary to draw these conclusions. To return to the bridge example, suppose I build a toll plaza on either side of a river. Toll plazas may be required for the functionality of the bridge, but you cannot cross the river without the bridge itself. The authors of the PIPA study only build the toll plazas and fail to construct the entire bridge.

Mr2001 wants to get me talking some more.

Good question. In a nutshell I will build a study and give you a little insight on some of the tools a researcher ought to use. Please forgive the upcoming notation. It is a necessary evil. And please don’t sidetrack by challenging the qualitative argument I am about to make, as it is totally nonmaterial.

Suppose I believe that ignorance drives a vote for Bush. I believe that Bush’s campaign spin machine has intentionally kept the electorate in the dark about the real state of the world, that is, we are losing in Iraq, no WMDs, no connection with al Qaeda, etc. I believe that only ignorant people could possibly be favorably disposed towards Bush given the real state of the world, and therefore, ignorant people vote for Bush. I intend to test this theory quantitatively.

I have a limited amount of funds, so I decide I can only survey 968 people. This is fine with me. The equations I will use to calculate the magnitude of the effect of ignorance on electoral choice have lovely assymptotic properties, so 968 is close enough to infinity for my purposes. I can make robust inferences with a sample that size.

Now I need a model. A model in this context is a mathematical representation of electoral behavior. I suppose that there is a True Model out there somewhere in the universe which can explain with perfect clarity and predictive power what drives an electoral choice. I am going to do the best I can to estimate the true model. I believe that the true model looks something like this:

V[sub]i[/sub] = B[sub]0[/sub] + B[sub]1[/sub]X[sub]1i[/sub] + B[sub]2[/sub]X[sub]2i[/sub] + … + E[sub]i[/sub]

Where:

V[sub]i[/sub] is the ith voter’s choice to vote for Kerry or Bush (0 or 1)
B[sub]0[/sub] is a constant
B[sub]1[/sub]X[sub]1i[/sub] is a coefficient (beta) multiplied by the value of an independent variable for the ith voter.
E[sub]i[/sub] is the disturbance term (epsilon), that captures all of the randomness in the world not explained by the model that acts on the ith voter

NB: This is NOT in fact the model I would use, as it expresses a linear, unlimited relationship between the dependent and independent variables. What I really need is a tool that constrains the output of the above function to 0 and 1, since those are the only choices the dependent variable permits. I would probably tools called probit or logit analysis, but since it is often helpful to express them linearly, I am sticking to the above for now.

There can be as many X’s as you want. If the number of independent variables is greater than the size of your sample, you will have problems. In this study, I would throw in quite a lot of independent variables in order to control for many sociological factors. I could code yes/no answers to factual questions as 1 or 0, and stick them in as independent variables. I would also code MALE as 1 or 0, education level as an integer (this is hotly debated, but not important here), I would divide the country into regions and code them as a variable, etc.

Then, I would break out my stats package, feed in the numbers, and it would spit out some useful information. It would tell me the magnitude of each of the independent variables expressed as the size of their coefficients, and it would tell me the standard errors associated with all of my estimated parameters. This statistic, when properly interpreted, can tell me what probability the coefficient could have been estimated randomly. If there is a low probability that the coefficient could have been generated randomly from some distribution, then the effect the coefficient expresses is probably significant.

Ok, so suppose that ignorance on a few policy issues has a coefficient with some magnitude and statistical significance. I report my coefficients and standard errors, and then I do some interpretation. The question that should be on everyone’s mind is, what happens to the quantity of interest when you control for some of the independent variables?

In this context, “controlling” for an independent variable means holding it constant while you change something else. In the math world, it means you take partial derivatives. Basically, when you estimate the model, it tells you how strong the effects of the independent variables are on the outcome, the dependent variable. So I would take a hypothetical voter. Suppose he is white, male, college educated, makes $50k per year, lives in the Pacific Northwest, and answered correctly on all of the factual policy questions. Suppose he voted for Kerry. What is interesting is to see what happens if you keep his sociological variables constant and change all of his policy answers. To do this, you assign values to his sociological and political X variables, recalculate the above equation, and see what happens to his vote choice. Alternatively, you can do the same thing for a high school dropout who lives in the southwest and makes $10k per year. What happens when you keep his sociological variables the same but change his answers to the policy questions? These are the kinds of things that interest social scientists: what happens to the quantities of interest when you change the independent variables.

The big question, then, is why do these tools allow us to infer population preferences from a sample?

  1. These tools quantify the effects of the independent variables and provide a means to assess their probability of being close to true using test statistics.
  2. You can use these tools to generate out-of-sample predictions that can be meaningfully tested
  3. The model generates quantitative hypotheses that can be tested rigorously. SentientMeat did a little “classical hypothesis testing” above. Rather than wave your hands with percentages, this method allows the researcher to test rigorously whether an independent variable has an effect on a dependent variable.

Model specification is where the science of quantitative analysis becomes an art. Every model requires that you make certain assumptions on how both the data and the real world behave. Sometimes these assumptions are reasonable, sometimes much less so. The inclusion or failure to include independent variables can also seriously bias the estimators. Finally, the way the data is coded also implies a host of assumptions that can be challenged. The art is how to get the most bang out of the most innocuous assumptions that you can.

The real kicker here is that when you specify and test a model, you can come to a real conclusion about what forces actually drive the results. From what you know about the relationship between the dependent and the independent variables, you can make inferences about the entire population. In the PIPA survey, the researchers made no effort whatsoever to specifiy a model to explain the relationship between the dependent and the independent variables. We don’t even know if there is any. There are random forces that drive stuff in the world, and from the results of the study, we do not know if the stochastic forces correlate with dependent variable. If the epsilons correlate with the vote choice, you’re pretty much fucked. It means that you have left something very significant out of your model that is biasing your estimators. Since PIPA does not show us any of this data analysis, we simply cannot conclude that ignorance has anything to do with vote choice, since for all we know, something else that correlates with ignorance actually drives vote choice.

I hope this helps. If you are technically inclined, here are some excellent notes by my first grad school quant teacher. He sometimes plays a little fast and loose with the notation, but the information is well presented nonetheless.

Please feel free to assail me with questions.

The big objection I am hearing is “how can you not infer from a sample?”

Of course you can infer from a sample. It’s fun and profitable, and I slaved over a degree in it.

The problem with the study is that the designer’s don’t, and that people who use the study to bolster their arguments are typically drawing wrong inferences. You cannot seamlessly infer that because X% of the sample replied one way, then population Y will function in the same way.

I suggested one model for experimental design. There are many, many other possibilities. Some are simpler, others more complicated. One other possibility is to use the data to estimate the likelihood that an ignorant hypothetical voter will vote for Bush using, say, maximum likelihood estimation. The principle of the likelihood function and its maximization is somewhat controversial and the math is harder, otherwise I might have tried to explain that one, too. It can be a purely empirical calculation that requires few if any covariates at all. It is also a valid inferential tool.

Simply pointing to the study and handwaving that there is a large correlation between ignorance and vote choice is uninformative. It does not allow us to make vote choice predictions, it is not testable, and it is anything but quantifiable. It is thus not solid ground from which to draw specific inferences. It is certainly representative of the population if the sample is random, but I submit that it tells us nothing about how a hypothetical voter or an out-of-sample voter will behave, and is hence uninteresting and uninformative.

Do you honestly believe that it tells us nothing? Absolutely nothing?

This has nothing to do with “honest belief.” The issue is statistical interpretation.

The researchers establish no quantifiable, testable relationship between the variables. They test no hypotheses. They assess the magnitude of no independent variables. They control for nothing. They make no effort to ensure that there is no correlation between the stochastic error and the dependent variable or the stochastic error of the ith vother and the stochastic error of the jth voter.

Other than kinda-sorta confirming what some people already thought they knew and giving others the vague idea that their regional/intellectual prejudices may be justified, this study tells us nothing. Absolutely nothing. About anything that might be interesting or useful in this context.

Maeglin, I don’t say this to be rude or flip, but I’m not sure why you feel the need to amp up the verbosity and jargon and start writing out regression equations. Keep it simple.

First, if they didn’t conduct inferential analyses, then that is that. Nothing can be said about the study other than that in this sample, more people said one thing than said another. However, it would not be complicated at all to do a chi square test or an ANOVA (again, depending on how they constructed their variables). Then they could say that the differences between these groups are meaningful, and that you would expect to find them by chance fewer than 5 times in a hundred, or whatever the results were.

You could then debate about the interpretation of that result – whether there were other factors that would likely have altered or explained the results that were not included, for example. But the point would remain – for these variables, the difference between the groups would not be expected by chance.

By the way, a regression with 968 people and an alpha of .05, including three covariates, would have a power of .97 to detect an overall R-squared change for those three variables of .02, a very small difference, and thus a pretty powerful test, wouldn’t you say?

I’m in the opposite boat unfortunately, so I’ll try to keep it short and to the point.

No, not really. I’m just pointing out that you are demanding a higher standard of statistical analysis than is the norm for political polls/surveys. The formula and methods you outlined, presuming you can get the coefficients and values of the independent variables at a voter-by-voter level(I can’t even really picture the questions needed to get this out of a human about a topic as subjective as politics), is much more rigorous and allows much more precise and accurate projections from the sample to the general population. This is not the norm for this sphere. When was the last time Gallup published a poll with a margin of error of ±.01%?

The bridge is, as I noted above, the concept of a representative sample and the efforts of the polling organization in achieving a random sampling in hopes of maximizing their ability to get a representative sample. It doesn’t give the precision or flexibility your preferred method does, but it is the norm and widely accepted/used in the poll/survey industry. The requirements for rigor vary wildly between areas of study. Your level of analysis would be aboslutely the bare minimum for studying catastrophic failure of overpasses, for example. A ± 3-5% confidence level just isn’t good enough when you’re talking about a mixmaster coming down in rush hour. You HAVE to know, as definitvely as possible, what the chances are so you can plan appropriate maintenance or replacement. You need to know the coefficients and the value of the independent variables which represent structural wear and tear from traffic, erosion, etc. This just isn’t the case in other areas and the professionals in those fields have developed general standards which define the level of rigor the field uses. Politics doesn’t use the same level of rigor as engineers or finance guys. That’s just the way it is. They construct bridges which are a little wobbly as opposed to the rock-solid bridges engineers build. 6-10% of the time(± 3-5%) they will fall during rush hour, but the reality is that most of the time (90-94%) they won’t.

Correlation is not causation. This is absolutely true and you’ll get no disagreement from me. The PIPA report made no such assertion as far as I can tell(I haven’t read the whole thing). A search for the word “vote” as a whole word or portion of the word came up with exactly one hit. “The Center on Policy Attitudes (COPA) is an independent non-profit organization of social science researchers devoted to increasing understanding of public and elite attitudes shaping contemporary public policy.” There were no matches for the string “voting”.

The actual assertions of the report seem to be that self-identified Bush Supporters are more ignorant of these issues, and Bush’s stance on them, than self-identified Kerry supporters were of the issues or Kerry’s stance. I have seen nothing which claims the issues of which they were ignorant were the driving factors behind their voting behavior. As I mentioned, the words “vote”, “voted”, “voter”, and “voting” do not appear in the document. Correlation between these findings of disproportionate ignorance and voting choice was left as an exercise for the reader. Thusly, and correctly, the onus for supporting the assertion is on the reader, not the report writers. It wouldn’t be the first time a study was misrepresented to score political points.

No interest in assailing. I had more than my fill of that during the siege of Gondolin.

Enjoy,
Steven

There are all kinds of people here, Hentor. I don’t feel any need to amp up verbosity. Some people get something out of equations, some don’t. If the equation’s don’t do it for you, hopefully my walk-through does. If you think my walk-through smacks of handwaving and obscurity, that’s why I have the equations. I am just trying to CMA.

Exactly. Too bad they didn’t provide their data set or we could have done something like that.

Indeed. There is a lot of fruitful debate there, which is why I suggested some kind of regression to try to figure out what the driving forces are.

I am in the “R-squared sucks as a reporting statistic” camp, personally, but of course I agree. 968 is fine N from which it is possible to make robust inferences, and your tests of significance are very powerful.
(Editied because the technical stuff is causing my head to ache enough anyways without adding in errors in quote tags. -JMCJ)

Thanks for the code fix.

Sorry to have mischaracterized your position, Mtgman. Let’s try again.

This is a real sore point of mine. I disagree. I am not demanding a higher standard of analysis, just a higher standard of reporting. It is unfortunate that most people simply look at a percentage of a sample, wave their hands, and voila, it is a population statistic. Intelligent, educated people are not frequently trained to think very critically about numbers. This permits all sorts of illogic, bad inference, and sloppy reporting. The way the media reports polls, especially with the ghastly “margin of error” statistic, only makes everyone’s life more difficult. It is very difficult to be stats-savvy in a world in which half-truths are glossed over with misleading statistics. It is a truism that it is easy to lie with statistics. The real truth is that it is easy to lie by not reporting statistics properly.

Exactly. And quantifying certain political variables is actually not very difficult. Party identification, for example, can be coded 0 or 1. There is a very popular ideological variable that, if I recall correctly, is continuous on the interval [-4,4]. You can code any yes/no question as another 0 or 1 variable. You can ask a survey respondent to choose a number between 1 and 10 that expresses his agreement for a given policy. These methods are not without their share of problems and issues, of course. The art, like I said, is to get the most mileage you can out of as few of these issues and assumptions.

I happen to be a finance guy and an ex-political scientist in training, and I am not sure if I agree with your estimation. :slight_smile:

I think that is a little disingenuous. The correlation is the 800-pound gorilla whose theme dominates the entire presentation of results. Considering that the study was taken before the election, it is not surprising that you don’t find the word “vote.” However, a search for “Bush supporter” or “Kerry supporter” would yield plenty of hits. It is my opinion that researchers strongly suggest a conclusion that may or may not be supported in the data, and they present their results in such a way as to privilege that conclusion over any other. You are quite right, they do not actually take responsibility for correlating these findings or trying to map the relationships between the variables. It is my opinion that this sort of exercise would have actually been useful.

This is not the norm for this sphere. When was the last time Gallup published a poll with a margin of error of ±.01%?

** Maeglin**, I think you’re making this is a lot more complicated than necessary. Maybe if you see the problem through an epidemiological lense, you’ll see where some of us are coming from.

In the sort of analysis done in the study, there was no need to come up with a predictive mathematical model. Why go through all that when a simple Chi square test would be sufficient to establish a conclusion? Consider the statement “Kerry and Bush supporters are equally inclined to answer the following questions correctly” as the null hypothesis. So you assemble a representative sample of the voter population, and ask them all the same questions about current events. Then you arrange the resulting data in a 2x2 table like so:

------------------Bush____Kerry____Total

Right_____A__________B___________A+B = X
Wrong___ C__________D__________ C+D = Y

       Total____A+C =U_____B+D=T________A+B+C+D = W

Where “A” equals the number of Bushites who answered the questions correctly, “B” equals the number of Kerryites who were correct, “C” is the Bushistes that were wrong, and “D” is the Kerryites wrong.

The chi-square test would require us to contrast the observed outcomes against the outcomes we would expect if our null hypothesis was true.

The expected outcomes would be determined by like so:

------------------Bush____Kerry____

Right____XU/W______XT/W ______
Wrong___ YU/W______YT/W______
In this study, all it would take to accept or reject the null hypothesis would be to mathematically determine how the observed outcomes stack against the expected. If the difference between the observed and expected outcomes is significant(as it was in this study) you can confidently conclude that the statement “Kerry and Bush supporters are equally inclined to answer the following questions correctly” does not hold true, based on the sample studied. Further inspection of the data shows that not only is the hypothesis wrong, but that Bushites were more likely to err than Kerry supporters.

In short, the goal here is not to try to predict behavior or even determine causation (which is damn near impossible anyway). We are only looking at association.

Maeglin, I don’t understand, but I want to.

Are you saying we have no reason to think the sample that was selected is representative of the general population?

If not, why should I not conclude from the results of the survey that most people who misunderstand the issues vote for Bush, while most who do vote for Kerry?

What mistake would I be making to so conclude?

-FrL-

I agree that a full statistical analysis of the data would involve doing all sorts of fancy things. But, you can get an intuitive idea of how good the data is when you have a sample size of 1000 by just considering the following:

Assume that you flip a perfect coin 1000 times. In other words, you are dealing with a system where the “correct” answer for the full population (in this case an infinite number of flips) is that heads comes up 50% of the time. Now, for the sample size of 1000 flips, we can compute the probability of a certain range of results. And, what we find is that the standard deviation is 15.8 in the number of heads or 1.58% of the total number of flips. This means that ~68% of the time, your sample would get a result of between 484 heads and 516 heads.

If you ask what the probability is that your result lies between 45% (450) and 55% (550) heads, then the result you get is that this would occur 99.84% of the time…i.e., less than 2 out of 1000 times of performing this experiment would you get less than 450 or more than 550 heads.

If you ask what the probability is that your result lies between 40% and 60% heads, then what you get is that a result outside of this would occur 1 in every 4 billion times. Just to give you a feel for what this means, it means that if you performed the full experiment of flipping a coin 1000 times once per second (impossible with a real coin, but possible to simulate on a computer) then you would have to do this for more than 100 years in order to likely find even one case in which the result was that you got less than 400 heads (40%) or more than 600 heads (60%).

This gives you some rough idea of what sort of percentage differences are significant in a sample size of 1000 people.

I hesitate to dip a toe into this high powered discussion, especially since I took statistics something like 20 years ago, and only because it was required for my engineering major. I’m not totally lost by the discussion but I’d be lying if I said I was following the finer points here.

My question though is this…what exactly was asked by this PIPA survey (I haven’t actually SEEN a link to it nor looked it over for myself…just parts of it)? My assumption is that its a series of questions on issues dealing with Iraq and Bush…i.e. the questions are partisan in favor of issues directly relating to Bush (i.e. were there WMD in Iraq, was Saddam and Iraq involved in 9/11, etc). Is this correct? And if so, how would one draw any kind of conclusion from this unless questions were equally asked that were partisan to Kerry’s positions…i.e. things that were incorrect or mis-stated by the Kerry campaign or by other Democrat organizations during the campaign (like will there be a draft, will social security be ‘privatized’, is the economy the ‘worst ever’, ect.)?

I guess what I’m getting at is it seems the survey doesn’t really tell us but one side of the story (if it tells us anything at all…again, I’m not following all the finer points of the statistics discussion here)…namely its only telling us that Bush supporters skewed towards Bush type misinformation…without really telling us if Kerry supporters are equally skewed towards Kerry type misinformation. So…to my mind at least, it doesn’t REALLY tell us if ignorance of the issues is more prevalent on the Bush side than the Kerry side…only that PARTISAN positions of misinformation about Bush are (possibly) more prevalent on the Bush side than the Kerry side on issues related to Bush. To me, that’s no big surprise.

If I’m totally off base here then feel free to disregard my question…as I said, this thread is way over my head, though I’m enjoying TRYING to follow along.

-XT

Which is, of course, the goal of some of the people publishing these sloppy reports. That doesn’t change the unfortunate reality that “poor reporting”(which is very generous of you by the way, the general term is “spin” and it is considered a negative adjective for a reason) is de facto the standard practice for political survey/poll reporting. There are places, specific political study institutes like Cato, where the deeper analysis is done and accurately reported, but it is certainly not intended for a lay audience.

That’s all true of course, but how does one chart ignorance? “On a scale of 1-10. How important is this area that you are ignorant of in your selection of a candidate?” And is it legitimate to track ignorance at all? If they were not ignorant they may have completely different views as to the priority of this issue. We’ve seen tons of things people get worked up about when they hear a soundbyte but when they take the time to investigate the situation they realize it was not outrageous and it falls off the radar as a non-issue. Case in point, the McDonalds coffee lawsuit. People who are ignorant of the basic situation and only know “woman won huge lawsuit for spilling coffee on herself” are often outraged. When they hear that McDonalds deliberately served their coffee dangerously hot to save themselves money and the severity of the injuries from the coffee, they often feel it was not a travesty of justice and move on. On the other hand, the ignorance DOES exist and may well influence other decisions, so it should be part of the model. Gahh, I don’t envy anyone who decides to do that level of analysis, although I would probably spend far too much time reading it.

I know, that’s why I used the example of finance guys. :wink: Amend my previous statement to “Politics intended for general consumption doesn’t use the same level of rigor in reporting as engineers or finance guys writing for their peers.” I fully realize there are more detailed studies happening, but John Q. rarely sees them.

Of course, and after reading the “analysis” section of the report I found it pretty clear this report is doing their best to spin the data. Still, they used all the right weasel words and covered their asses. They never came right out and said people were voting from positions of ignorance. The implications are obvious, at least to me and thee, but at worst they are guilty of the same stuff Bush did when he continued to conflate Saddam’s regime and Al Qaeda.

Probably, but the model would be pretty difficult and may have made the surveys overlong or so dull/repetitive they would have had less respondents. To get B[sub]1[/sub]X[sub]1i[/sub], B[sub]2[/sub]X[sub]2i[/sub]… you’ve got to ask a bunch of questions, probably phrased in a way the respondents have never thought about the issue in. Also probably penetrating deeper into the issue than they have probably thought. Asking someone how important an issue is to them is a question with a lot of overhead. A lot of people don’t do that kind of analysis of their own priorities, so when asked to rank them they have to do some soul searching. This both decreases the chance they’ll be interested in taking the now complex survey as well as shaking up the results a bit because their current rankings may be “off the top of their head” as opposed to reflections of thought-out positions.

Enjoy,
Steven

On Preview: xtisme, the full report is available from the link in my first post on the phrase “the PIPA report itself”. I am not aware of any information on the actual questions used, in fact that is part of Maeglin’s problem with the report(and I share his disappointment in not having more details about the questions and answers).

Ok, thanks…I missed that link somehow. I’m going to go off and look it over and then just lurk this thread…its way over my head.

-XT

I should probably appologize to anyone who was irritated, and there are probably a few out there, who were annoyed by my overuse of the word “probably.” I probably wouldn’t have used it so much if this weren’t a discussion of probability and the probable inferences from the report as well as the difficulties they probably would have encountered had they structured the study differently. Still, it is probable that proper survey performance, preparation, and publising of the paper would have presented less problems.

Enjoy,
Daffy Duck

Like you with the face said, you seem to be trying to do lots of cause-and-effect analysis and trying to figure out how much ignorance of the facts actually drives voting behavior and what-not. More power to you.

However, that is quite a bit harder to do. What the PIPA poll shows is simply a correlation. We are not claiming that people vote for Bush because they are ignorant on Iraq. It could be that they are ignorant on Iraq because they support Bush…i.e., they are predisposed to support Bush and thus they tend to ignore facts that go against their predisposition. (This sort of thing is in fact something that PIPA hypothesizes although these sorts of conclusions are clearly on much shakier grounds than simply the data itself.) Or it could be that support for Bush and ignorance on Iraq are both correlated to some third factor like getting their news from Fox. However, this does not negate the correlation that exists between ignorance on Iraq and support for Bush.

That’s right. I don’t read the study as establishing any particular cause/effect relationship. It’s just that I think to myself, upon reading the study, “Shouldn’t it worry me if it turns out most people voting for my guy are ignorant of the basic facts upon which they should be making that decision? Doesn’t that make it likely that I did so?”

Should I not be thinking this after reading such a survey? If not, why not?

-FrL-

Here is the report on the study. Here are the questions asked and the results obtained.

Well, to be honest, I think that your critique here, which is something that I was also sort of wondering about too, is really the only valid critique I have seen of this study…and certainly much more valid than Maeglin’s critique. Of course, it doesn’t negate the fact that Bush supporters are much more ignorant on Iraq (and the other foreign policy matters that were probed in this survey) than Kerry supporters. But, sure, I suppose you could argue that there might have been questions on other subjects that they could have asked where Kerry supporters would have been more ignorant. I am not sure exactly what those questions would be. I think the ones you mention are problematic, i.e., I don’t know how one determines the truth to the question “Will there be a draft?” unless one can predict the future. And, as for privatizing social security, one would presumably want to ask it with a variety of choices for what the candidate’s positions are since Bush does in fact endorse a plan under which a portion of it would be privatized (although one could quibble about what the meaning of the word “privatized” is).

Anyway, I propose that you look at the questionaire so you can see the full range of questions asked (which are a fairly broad range of international policy issues). Then you can come back to us and give us some specifically worded questions that have a “right answer” that we can generally agree on where you think that Kerry supporters would likely perform worse than Bush supporters. Then maybe you can try to lobby Knowledge Networks or PIPA to perform such a survey. (Unfortunately, PIPA seems to limit its field of study to “international policy” attitudes.)

Lots of stuff to respond to, once again. Apologies if anything gets lost in the shuffle. If I missed something salient, please feel free to let me know.

Thanks for your suggestions, you with the face. The chi-square is a fine way to do it. I would probably have computed the standard error of the difference between the two random variables, that is, the percentages of Bush and Kerry support, and used this statistic to calculate the probability that one is in fact higher than the other. The epidemiology angle is apt.

Which is, in my opinion, the inherent problem.

Studies that report only association in highly politicized issues are, I believe, unfortunate and intentionally misleading. What do they really tell you: are there more ignorant Bush voters than Kerry voters? Or is the probability of being ignorant higher if you are a Bush voter? Or is the probability of voting for Bush higher if you are ignorant? Or does ignorance merely correlate with something else that really drives the association?

It is one thing to conduct epidemiology studies that are peer-reviewed. Simply associating two phenomena and tossing the results to the wolves is irresponsible, misleading, and, as I argued earlier, not all that informative. I believe that there is more information contained in associating, say, side effects with a medical treatment than support of Bush with ignorance. You have a very strong prior belief that the side effects are driven by the medicine, and you can give placebos to a control group.

In politics, having a similar set of priors is very dangerous and there are no control groups.

jshore, I am not sure if you are responding to me. I agree completely that the size of the survey is more than sufficient. The nice thing about the coinflip example, however, is that you already have a probability distribution over the outcomes of a fair toss. The same is not so true in electoral political analysis.

Plenty from Mtgman. I had no idea you knew what industry I worked in, as I am usually pretty private about that. I probably posted in one of those career threads or something in a fit of excitement when I finally found work. :wink:

And by the way, I think I got it worse in Gondolin than you did.

Indeed. It is very hard to do, though very satisfying.

There are all sorts of ways you could quantify ignorance. I would not quantify it on a 1-10 interval, because that would imply that the difference between 1 and 2 is the same as the difference between 9 and 10. I do not believe this is true: a few correct facts can make the difference between a total ignoramus and someone who knows just enough to be dangerous, while at the upper extremes, the difference between an expert and an EXPERT can be very subtle and the product of years of effort.

I would set it up entirely binary. First, I’d ask people if they think they know the answer to a question. Yes or no, 0 or 1. I would then do just as the surveyors did, and ask them a yes or no question. Again, 0 or 1. In my model, I would estimate not only the marginal effects of thinking you know and being correct, but the interaction between them. I would be especially interested in seeing if thinking you know and being wrong affect electoral choice. I might ask respondents to grade how sure they are on a 1 to 10 interval and study the interaction between confidence, correctness, and electoral choice. That might be an interesting way to test WB Yeats. :wink:

How one phrases the questions is another art. I do not know much about survey design at all. I could suggest a few books, though.

You make some excellent points about how one structures the survey. The issues you highlight are not trivial. A statistician would respond that all this extra stuff you point out can be captured in the stochastic disturbance term. We would also have to assume that the variance of the disturbance is 0, that is, all the errors cancel each other out in the long run. However, if what you are saying is true that the survey itself influences the responses, then we can actually correct for that by altering how we estimate the disturbance.

In other words, there are kinda-sorta fixes and dodges, and there are big problems with using surveys. There is a very good book on this by Michael Berinsky called Silent Voices. It discusses the various analytical problems with survey data in more than a little detail.

On preview, some responses to jshore.

Yeah. A correlation between people who support candidates and their knowledge or ignorance on certain issues. It makes no claims about causation, no claims about the error structure, nothing. Therefore, there are very few conclusions that you actually can draw from this correlation, other than to say, “some correlation exists”. You cannot conclude that “more ignorant people vote for Bush,” since the survey reports no observed behavior nor estimates the likelihood that anyone in the survey will actually vote. You cannot conclude solidly how many more ignorant people support Bush than support Kerry. Hell, Bush could have more non-ignorant total supporters than Kerry.

Other than perhaps some grist for an SDMB debate or to confirm one’s own prejudices, this correlation is not informative. Perhaps some have a lower standard for information. I don’t know.

I heartily disagree with your assessment on both counts.

Perhaps. But I do not believe that these kinds of things would be terribly indicative of the last four years of Bush’s policymaking track record. I bet that Bush voters probably know a lot more about gun control legislation, for example, or perhaps how many times a day he prays. But it would strain my credulity to put those on the same policy level as, say, Iraq.

I’ve run across you in LiveJournal land, where I’m a lurker on the SDMB community and Col’s journal as well as a few others. But yea, I’ve seen enough to know you worked in a big finance house for a while although you mentioned that you didn’t really enjoy it. We share a love of the sword as well.

Yea, but you deserved it. Farking Ecthelion!

Enjoy,
Gothmog