Census Statistical Sampling

I’ve never been 100% clear on what this “statistical sampling” stuff is with the census. I’m steering clear of debate territory here (I hope), I just want to ask a few questions of fact:

  1. Somehow, this method is used to estimate the undercount. How do they know that people were undercounted? What sample do they take which is known to include people not counted in the census?

  2. If this sampling (not the adjustment, I understand that that’s merely a mathematical guess, I’m referring to the sample by which the undercount is estimated) is able to find and count those people, then why not conduct the census that way?

I’m just going from memory here from when a Census Bureau person gave a talk at my workplace

  1. The Census Bureau has already done estimates of all the tracts, so it knows how many responses it should get in its mail and in person canvasses. Presumably, smarter people than I can figure out how many need to be added. I know that you can find out the actual number of “persons” added to the count later. They will be listed under “imputed persons” or something like that. They are always listed way at the end.

2)I’ve always heard the arguments about sampling this way.
a) Joe Republican: The Census says that it has be to be “an actual enumeration” of the people. That means a head count. No guessing.
b) Joe Democrat: But the Census always misses a high percentage of certain groups and they don’t get adequate representation. Since we can use math to figure out who me missed why don’t we count them.
c) Joe Republican responds: But if we guess on the Census, why don’t we just do estimates to see who will win elections?
d) Joe Democrat says: Could it be that you are afraid that the Census is undercounting potential Democratic voters more so than Republicans.

And so on and so on.

So, we change the constituion. It’s been done before.

Questions from the OP:

  1. You can always do a little more work, at anything. That can give you a statistical basis.
  2. That little more work, scaled up to the entire census, would be prohibitively expensive.

::Statistician enters the room::

I actually took a course on Survey Sampling Designs, wherein we devoted quite a bit of time on the census. Of course, I don’t remember everything from the course but I do know a few things…

  1. What BobT said in #1 is correct in terms of what is expected.

  2. Because it is a census, everyone must be counted. Statistical imputation is much closer to counting everyone than not.

  3. Statistical imputation is MUCH CHEAPER than other methods, regardless of the scale. Non-respondents and non-contacts must be found with a person, walking the street and knocking on the door. This person costs money each hour. The statistical imputation is faster and doesn’t get reimbursed for mileage.

  4. Statistically speaking, our direct census method misses more inner city poor people. There is an enormous correlation between race and income in inner city neighborhoods where under-representation is a problem. So, yes, the populations under-represented are similar in make-up. (It is not unreasonable to assume that they may have similar political ideologies.)

The census bureau has a very nice website and addresses this topic.

Statistically Speaking:

The average human has about one testicle and one breast- paraphrased from -Statistics 101

I would think form your argument that it would be cheaper to count the inner city. More people per unit area. The cost of door to door people would be cheaper per residence if the residences are close. It would be more costly to count those that don’t respond in rural communities where folks live on many acres. Perhaps your statistics are wrong, maybe the rural people are undercounted.

OK, a house in a rural area is more likely to have a resident than an apartment in the inner city. Yes, this a general statement, but I use it merely to demonstrate the point. IF the censur bureau was ONLY using face-to-face interviews as its means of data collection, you would be correct. However, being the most expensive design, F2F is used only in high density areas where mail-backs may be low. Rural area non-respondants (blanks) are not as “likely” as inner city blanks. There are lots of tenement/apartment buildings that sit on census tracts, with valid mailing addresses, etc. that do not have a resident at the time of counting. The census bureau does not know if an apartment is vacant or not until they check it out. This requires the personal visit, often multiple visits, in case the person is not home. This is why the under representation is more “detrimental” in inner cities. Further, the interviewer can possible have an entire demolished apartment building removed from a tract with one visit. This clears up hundreds of blanks.

As Mashie stated, it is cheaper to conduct interviews in an innercity than in the rural country, but my quoted post does not mention rural country at all.

BTW, your quote about the one testicle and one breast is cute, but actually goes to show the huge problems associated with untrained people using statistics incorrectly. Obviously, the original speaker of this quote does not understand the concept of cohort selection.

Ahhh… or the original speaker of this quote may understand that many people who use statistics to back up their arguments may not:
A. Understand cohort selection
B. Be sure that the group doing the statistics of which he/she quotes has applied the correct statistics

You may not have mentioned directly the rural population, however you said "Statistically speaking, our direct census method misses more inner city poor people. " I assume you meant in relation to people not in the inner city, or you would not have used that description.

Sampling for Congressional districts would help the Democrats. Naturally they support it, and Republicans opppose it. There is a Constitutional requirement of an “actual enumeration”, which could be interpreted as barring the use of statistical sampling. However, a couple of years ago, when sampling was considered by the Supreme Court, they sidestepped the Constitutional question. They decided that Federal law prohibits sampling for this purpose.

If Federal law were changed to allow sampling, then the Supreme Court might have to interpret the Constitutional wording. The theory of statistical sampling did not exist when the Constitution was written, so it’s anybody’s guess how the Supreme Court would rule.


  1. One way to measure undercounts is take a sample of your sample, and go to those districts, and do a complete population count by more expensive complete means. Then you see how close your estimate was, and extrapolate that to your full sample. That is, check your answer on a couple districts, and see how close you are. I don’t know if this is the method actually used. You could also check it against any number of other surveys and censuses (censii?) that are done for any reason.

  2. The reason the census is not conducted that way is that the Constitution specifically says to use an “enumeration”. That means counting citizens one by one.

Statistical sampling wasn’t known at the time of the framers. It is a superior methodology in terms of biases, cost, and accuracy. In terms of math and science, there is no question that we should use statistical sampling. All of that counts for exactly spit when it comes to the Constitution.

It also just so happens that the current system tends to favor the Republican party, so they are strictly consitutional on this issue. Changing it to sampling would give an advantage to the Democratic Party, so they are generally in favor of sampling.

I was a Census Crew Leader in downtown and south Memphis in 2000. This is about as “inner city” as you can get. From hard experience, I can say that there is absolutely no way that we can count everybody, whether face to face or by mail. People moving in and out of apartments and houses around April 1, people’s work schedules, bizzare personal habits (don’t ask), suspicion of anybody that works for the government (which is understandable when you consider that the only reason a government employee comes to some of these neighborhoods is to arrest somebody), and just plain ornreyness mean that we’re going to miss some folks. If I had a nickle for every time I ended up saying “just tell me how many of you there are in there” to a closed and locked door, I’d have four or five bucks by now. But, having said that, I think that we did a really great job of counting as many people as possible. I don’t know what the official numbers were, but I’d say that we found 99% of the people who lived in my district on April 1. The rest were people who didn’t want to be found. And I say screw them, anyway.
I’m as liberal as they come, but after my experience this spring and summer, I’m against statistical sampling in the Census. While there is room for improvement in the system, I still think that an actual head count is the best way to do things. I’d love to see a comparison of our numbers from the district with a statistical sample–I’ll bet we were pretty damn close.

Spritle–Any address, rural or urban, that did not return their census form got a knock on the door from an ennumerator. F2F interviewing of nonrespndents was used everywhere, not just in the cities. And several times we ran into exactly the situation you described–an apartment building had been torn down and we had a whole slew of Non-Response Follow Up interviews to do. Sometimes that resulted in getting 50 cases complete in an hour. On one occasion, when a new building had been built in place of an old one, it cost me three weekends in a row to get the whole thing sorted out.

Part of the biggest problem is the fact the Constitution says an enumeration which is an actual count and we don’t do that.

For instance there was a big stink in Chicago, census takers were going to high rise apartments and no one would let them in so after two tries they mark building vacant. Some of these were over 50 units. It is clear that people occupy those buildings.

This is not an enumeration. So we currently aren’t doing an enumeration which is what the Republicans say we should be doing.

It is interesting to note Chicago clearly has gained back some people. We should see this on the census. If we don’t we know something is up.

Republicans (well a few anyway) went to say don’t answer the questions on your census. Some census workers weren’t counting questionaires (sp?) unless filled in complete. Some took this to mean don’t send back the questionaires at all.
Not only is there a Dem vs Rep issue there is a rural vs urban issue.

BTW Detroit is predicted to be the first city to have a million people then fall below that number. So now we only have 9 cities predicted over 1 million people in the USA

(NYC,LA, Chgo, Houston, Philly, San Diego, Phoenix, San Antonio and Dallas – and don’t think San Antone didn’t rub it in when they passed Dallas)

You mean other than Cleveland?

Meanwhile, there’s two different discussions going on here. The first is “How does the census bureau take the census, and why does it use that method?”. This is a General Question. The other discussion is “Should they be using those methods?”. This is a Great Debate. It’s a fine distinction, I know.

So the basic idea that I’m seeing is that it is known what methods will result in a comprehensive count, but that such methods are considered prohibitively expensive?

Hmmm. I wonder if a case can be made that without allocating the money for a count that is properly complete, the count that is taken is not a constitutional “enumeration.”

Just a thought.

Chaim Mattis Keller

Of course, we’ve recently seen the humongous problems associated with trained people using statistics incorrectly.

And I don’t think that that is all that obvious either.