Of course it does. That’s where the idea of a *representative* sample comes from. You wouldn’t expect an election poll taken at either of the national conventions to be accurate, would you? Neither do statisticians. A big part of statistics is learning how to construct representative, random samples.

If I want to find out how voters from Illinois feel about the upcoming election, and I decide to poll 1,000 people living in Chicago and no one from downstate, that will probably not be an accurate measure of how the people from the state of Illinois feel. Likewise if I were to poll 1,000 people from downstate and no one from Chicago.

The point is that you first must determine what, exactly, you’re trying to find out. If you’re wanting to know what the people of Illinois feel about the upcoming election, you’d need to have a representational mix of people from the entire state of Illinois in your sample.

There are several ways of selecting samples. Using my example, you could use simple random sampling, in which you would, say, instruct a computer to randomly generate 1,000 social security numbers originating in Illinois (and then generate more as needed to replace those who have moved out of state, who are too young to vote, etc.). This would (assuming your random number generator is sound) generate a purely random sample of citizens living in Illinois who have a social security number.

Another method is called “cluster sampling.” In this method, again using my example, you would divide the state up into parts. You could use counties. You would then take a random sample of those counties (by, say, assigning each one a number and generating ten or fifteen numbers), and in the counties you selected, you would do your survey. The disadvantage to this is that the distribution of counties may not be representative of the distribution of people. If there are 5 heavily populated counties and 95 rural counties, odds are I would randomly pick few if any heavily populated counties. Thus the sample would not be representative.

Yet another method of sampling is “stratified random sampling.” In this method, you again attempt to break up the population into meaningful bits which would likely have different opinions about whatever it is you’re trying to find out. Perhaps for my example we could split Illinois into low-income, medium-income, and high-income groups. In each of these groups we could do a simple random sample, and that way we could be sure that each group is equally represented. Aha! You say, “What if each group is not equal in the overall population?” Then you should proportion your sample to reflect that. If Illinois has a 20% high-income, 50% mid-income, and 30% low-income population, your sample should reflect that. Randomly select 200 people from the high-income group, 500 from the mid-income group, and 300 from the low-income group. In this way, each of those three groups would be accurately represented within your sample.

Finally, there is “Multistage sampling.” This is exactly what it sounds like: You combine the previous methods to refine your sample further.

For example: Determine whether each county in Illinois is high-population, medium-population, or low-population (stratified sampling). Randomly select 20 counties, in proportion (so if 3/5 of Illinois counties are medium-population, however you define that, 12 counties should be medium-population, etc.). In each of these counties, determine income levels and randomly sample 50 people from each county, in proportion with their income (stratified sampling again). This method of sampling would ensure that your final sample was proportionally distributed among heavily-populated and sparsely-populated counties, and that within those counties, your sample was proportionally distributed across income levels. Basically, people from both dense and sparse areas are fairly represented, as are people of various income levels. That’s an example of multistage sampling using stratified sampling twice. There are other ways of doing it.

As I said, a big part of statistics is learning to construct representative samples. This is a bit simplified (and possibly wrong on some details, as I’m not a statistician, I’ve just studied it), but the general idea is there.

To sum up, yes, it matters a GREAT DEAL where you sample from, which is why great efforts are undertaken to ensure representative, random sampling.