Say I want to know what percentage of people are doing a yes/no thing. Say people wearing hats vs not wearing hats. Or wearing white shoes vs black or any other color shoes. My inclination would be to just take out a piece of paper and start tallying yes/no of everyone I see walking down the street. But how many people do I need to note data to get reasonably accurate results, and is there a way to tell how accurate it is based on how many people I’ve noted data for?
You’re talking about sampling, and it’s complicated! First of all: what population (group of people) are you trying to measure? You didn’t say in the OP. Americans? Or some smaller group?
That’s called “convenience” sampling, and using that method alone you’d have a hard time drawing meaningful conclusions about a population much larger than “the people I sampled.” No matter how large your sample!
I.e., after running your sample for a while, you could say with confidence that the 5000 people you saw walking on Your Street at such-and-such dates and times were wearing hats 20% of the time. But you couldn’t extrapolate from that (impressively large!) sample to the hat-wearing behavior of Americans, or residents of Your State, or of Your Town, or even really of walkers on Main Street “in general.” The relationship between your sample and these larger populations isn’t (I’m assuming) well-defined.
Start with defining the population you want to survey; that’s going to affect the methods that will work and the sample sizes you’ll need.
And then suppose that you do pin down what population you’re interested in, and you find a way to do a good job of randomly sampling them. How many people do you need to survey now? Well, that depends on the margin of error you want. Roughly speaking, the needed sample size is about one over the square of the margin of error. So, for instance, if you want a margin of error of 5% (or 1/20), then you’d need to survey about 400 people. Notably, this does not depend on the size of the whole population: Assuming that your random sampling really is good (which is a big if), then you can sample 400 people in your neighborhood to learn about your neighborhood’s haberdashery habits, or you can sample 400 people out of all of the US to learn about the US’s hat-wearing.
Your problem is that–even though your method SEEMS random–it isn’t. The only people you will survey are ones that happen to be walking on that particular street on that particular day.
The old way was to call every Xth name in the phone book. If you want to do it in person, you need to set up some sort of grid–say one street out of every mile, across the city. And do it probably both morning and afternoon. If you’re lucky, you’re sample will be random enough to be a reasonable approximation of the general population.
As others point out, it may be hard to design a representative sample. For example, a telephone survey will get different results depending on what time of day you call, or whether you call mobile phones or land lines.
For this post let us assume that bias is not an issue — you are somehow confident that your sample truly is unbiased. And assume(*) that the survey results are in the ballpark of 50-50 (i.e. about half of those surveyed answer ‘Yes’ and about half answer ‘No’). Chronos gave the correct answer:
More generally, and more precisely, if you have an unbiased survey of 384 people and 192 of them (exactly half) do wear hats, you would say
There is a 95% chance that the number of hat wearers in the total population is between 45% and 55%. (IOW, the margin of error is 5%.)
As Chronos implies, quadrupling the sample size would cut this margin of error in half. The general formula is
Margin of Error = 1.96 √(p * (1-p) / N)
where N is the sample size, p is the portion of ‘Yes’ responses (.5 in our example). The constant 1.96 is found from this table where z=1.96 maps to .475 which is half of 95%. (You can see in the table that 2.575 should be substituted for 1.96 if you want to replace “95% chance” with “99% chance.” Or replace “95%” with “95.45%” to use Chronos’ 400-sized sample instead of 384.)
- Survey “margins of error” are generally reported as a single number and assume p = 0.5 (the worst case). This can lead to nonsense. For example a survey of 384 would report
50% of the population wear hats (margin of error ±5%)
20% of the population wear skirts (margin of error ±5%)
1% of the population wear corsets (margin of error ±5%)
The hat figure is correct, as we’ve seen. The skirt figure should be ±4% (as the above formula yields) instead of ±5%. The quotation for corsets wearers is just silly (what is 1% minus 5% ?) and even the formula above is imperfect due to asymmetry.
As other posters have said, the problem in doing good surveys isn’t primarily sampling enough people. The problem is finding a way to survey a group of people who are a random set of the entire population that you want to survey. That’s very difficult, and it’s what separates a good survey from a bad one. Once you do that, you can easily calculate how accurate your survey is. If you’ve surveyed 10,000 people (and they are really a random set of the entire population), you just quickly calculate that 10,000 is the square of 100 and 1/100 is 1%, so your survey is close enough to being accurate within 1%.
As someone who now works in Marketing research, we often deal with this. What everyone else has said is bang on.
First, in terms of sample size - we commonly ask if clients want their results to have statistical validity across the country, the province, the city or a specific neighbourhood in the city (we do a lot of work for retailers who want to better know the area around specific stores). There are vastly different sample sizes required for validity nationally versus nationally down to the neighbourhood level.
Similarly, do they want validity in a sample of "adult women 18+ " versus “women aged 30+” versus “35 year old women”. Ultimately the client decides, usually balancing the desire for more detail versus their budget.
Second, although it sounds like “convenience samples” are not good research, in reality they do play an important role, typically in gathering background information before main research is conducted.
Following along the OP’s example, if my client wanted a detailed look at shoe colours worn in Miami, but didn’t have any info beyond that, I’d conduct preliminary research prior to a full study. The convenience sample would be done where we agreed was “somewhat representational”.
As an example I’d like to get an idea of whether we are dealing with 3 colours or 30 colours. Also, since we’d likely be doing the main study observationally, I’d be taking photos so all my field team knows exactly how to correctly classify colours (if required), “red” - light red, dark red, cherry red, burgundy etc. Its obviously important that all the field team is consistent in how they report.
I think I must be misunderstanding what you’re saying here: It sounds like you’re saying that the needed sample size depends on the population size.
I think what he meant is the sample size for “national” is one (fairly small) quantity.
But the sample size for “separate samples for each zip/postal code nationwide” is a different and much larger number. Because it’s actually thousands of different separate sampling operations using a common definition. Each of which needs that same fairly small quantity to achieve the same confidence intervals.
If I’m a company with national scope trying to decide where to build new stores or spend money on local advertising, only the second question will bring me actionable decision-making info. An answer to the first question will be useless.
Obviously a real-world company can use off-the-shelf demographic data to pre-filter out any postal codes that are no-hopers. No need for a yacht advertising in Appalachia.
For example - whether your sample is wearing black shoes, white shoes or colored shoes - are you walking around Wall St., a university, a suburban mall, a sports stadium etc.? That choice biases the sample. Similarly for hat/no hat. Near a sports stadium on game day, or a university - different result than Wall St.
(There’s the old joke that the polls predicted “Dewey defeats Truman” by phoning a random population sample. Well, in 1948 phone ownership was still a biased sample and the ones without phones - poorer - apparently voted Truman.)
And nowadays, we again have a large number of people without landlines, and the distinction also tracks with some political tendencies, and that causes polling companies a lot of headaches, too. Well, at least the polling companies who care about getting it right.
Another market research professional weighing in (and agreeing with the previous replies).
This is exactly it. If you only care about understanding the answers to your survey for the overall population, then you don’t need a terribly large sample – in most cases, a sample of 300 to 500 respondents is sufficient to give you a good, statistically stable base (assuming, as others have noted, that you’ve been able to build a sample that is, in fact, reasonably representative of your population).
But, if you want to look at your responses by sub-groups (such as, by individual cities, as in LSLGuy’s example, or by age, sex, income level, etc.), then you’re going to want to make sure that you have enough respondents in each of those sub-groups…and that means that your total sample / total number of interviews will need to be larger.
And of course, since truly random sampling is difficult, and since we live in the real world, pollsters will sometimes try to do better than random. They’ll pick a set of respondents, and make sure that they have the right proportion of all of the particular demographics they’re interested in. This can work sometimes… but not always, and you can’t always tell when it will fail. For instance, maybe you missed that one particular demographic category is important, and got too few or too many of that demographic in your hand-picked sample. Or maybe you have all of the right categories, but the wrong number of some of them (this is what Karl Rove’s “unskewed polls” was about: He saw how many Democrats supported Obama vs. Romney, and how many Republicans did, and took the proportion of Democrats and Republicans in the polling areas and did the math… except there were actually fewer Republicans and more Democrats than he thought). Or maybe you have, say, the right proportion of blacks, and the right proportion of church-goers, but you don’t have the right proportion of black church-goers (this is the “intersectionalism” that you’ll sometimes hear talked about).
Because truly random sampling is difficult, some sort of systematic random sampling is usually the way to go. Say you’ve got a list of 120,000,000 addresses or telephone numbers you’re going to choose your sample from, and you’re going to sample 40,000 of those addresses/phone numbers, you divide 120,000,000/40,000 = 3,000 to get your sampling interval or take-every, and then you randomly* choose a number between 1 and 3000 as your random start. If your random start is 1743, you select the 1743rd, 4743rd, 7743rd, … , 119,998,743rd address or phone number on your list.
If there’s a particular demographic you want to ensure isn’t over/underrepresented, before you select your sample, you sort the list in a way that tries to bunch addresses/phones of persons more likely to fit that demographic towards one end of the list. For instance, if you want minorities to be accurately represented, you could rank the counties or area codes by % of nonwhite, and sort the addresses by county/area code.
*Well, more or less randomly. :shrug:
If you have that list of 120 million phone numbers, and you can rely on getting a response from every phone number you call, then it’s really easy to just randomly select whatever number of them you want. But both of those ifs are significant. You might have the list of phone numbers, but you’ll almost certainly get a lot of hang-ups, busy signals, and “No, I’d rather not participate in your survey”. And if you just throw those out, then you’re risking the possibility that the set of people who don’t respond have some bias on the question you’re interested in (like, maybe people who support one candidate are more likely to hang up on pollsters).
Thanks for clarifying - the comments are correct.
We can do a conformable “statistically accurate” “national sample” with 300+ people, but that would not have accuracy for sub-groups within the sample, whether those groups are based on demographics or geographic. To be “statistically accurate” for those, we would have to increase the sub-group sample size to much higher levels (+100 or more). That can increases the cost dramatically.
As far as randomness goes, its a moot point since it really doesn’t exist in any research. The very fact someone is willing to talk to you and answer questions means that at best you’re considering the opinions of “people willing to participate” in a survey. In observational research as a couple of you have correctly noted, the best you can say is that you’ve tracked the shoe colours of people walking down main street between 12 & 1pm.
In the example of phone call being random, even you can dial random numbers, your sample is biased because you’re really only talking to the sub-group of people who are home when you call and willing to talk to you.
In our case, I ask our client “Do you have any reason to believe that this sample has any different opinions / shoe colours than the target population as a whole?” and we work through that trying hard to punch holes in the sample rationale looking for sources of bias.
As many have noted, sample randomness is very big factor in political polling. It can be used to skew results quite dramatically. Even if the sample is as random as they can possibly make whatever technique they use, things like the day of week contacted as well as the time of day have big influences. If you call during the afternoon you could have a high proportion of single parents or unemployed people replying. This group wouldd obviously have a different view than working people on the some issues. A liberal pollster may want that, a conservative one would want evening calls, favouring employed people.
You write as though pollsters with an agenda might introduce deliberate bias.
A friend whose student job was working on phone surveys said that she often filled her quotas of mothers of young children / students / unemployed within an hour, but would spend the rest of the week trying to populate the waged / full time employed categories because they were never home. And now mobiles have shot that all to hell as well.
So how does this change the accuracy of things if a person knows the population size and there’s no possible error introduced by the question.
Say there’s 500 girls at high school. A person notes down that 27 out of 50 of them are wearing dresses or skirts as opposed to pants or shorts. You know the population size and there’s no chance of inaccurate answers because they misunderstood the question or lied. Is it possible to quantify how accurate the data is?
It is possible to use the statistical power of the test to determine how likely you would be to accept a false hypothesis, which I think gets to what you’re asking.