I haven’t seen this question asked here to my knowledge but has there ever been an empirical verification that random sampling is actually a true predictor of results. Let me explain: there are 10 barrels of apples. Examiner chooses barrel 3 and discovers 10 rotten apples in it, and assumes that all 10 barrels will have 10 rotten apples. Has anyone actually examined all 10 barrels to see if that assumption was valid? If not, why use it? Someone might throw out all 10 barrels based on that one sample.
There is a bit to unpack here.
Nobody has ever suggested that such sampling is a 100% guaranteed predictor.
Moreover in the example there is a second assumption, that there is some connection between the different barrels. The nature of the connection needs to be carefully spelt out. Why would rotten apples have any connection between barrels?
In the simple lest case, one could add the assumption that all the barrels contain apples from the same source with no further distinction possible. (Which is already open to question. Apples in the same barrel may be more likely to come from the same tree, or be puck d by the same person.)
Once you do this, random sampling of a small number of apples gives you an estimate of the true distribution of rotten apples. How good this estimate is depends on how many you sample form the total number. Obviously it gets better the more you sample.
What is important is that how good this estimate is can be calculated. So if, say, you want to know with a 90% chance of being right you can work out how many apples you need to check. Or the reverse, knowing how many apples total, and how many sampled, you can get the odds on the sample estimate being good.
So, you don’t need anyone to check a real row of barrels. Unless you check every apple you will never know for sure. But you can know what the odds are that you are right.
It is really no different to watching a coin toss. If you see someone toss a coin and get heads five times in a row, do you conclude the guy is really lucky or that the coin is a double header? What about 6, or 7, or 20?
I can say emphatically yes in my field [archaeology]. When people are designing sampling strategies, e.g what benefit is there if we dig with small test trenches scattered across the site vs one long skinny trench down the middle or if we dig three times as much do we get triple the useful info, they will test these against real world sites where the different success rates can be measured against each other.
Obviously a 100% sample [check every apple in every barrel] will give you an irrefutable answer, but the aim of sampling theory is to identify how much effort you need for what level of certainty you want. The sweet spot in finding the most out with the least effort needs to be demonstrated on real data before any lazy scientist will agree to it.
[Laziness in science is one of the key drivers to encouraging greater efficiency and reducing faffing about - there should be a Nobel Prize for Laziness].
It’s been too many years since I was versed in Statistical Process Control but this is basically what you’re asking about with regards to manufacturing.
You sound like you’re the person selling the apples. If you are the person buying the apples, does your viewpoint change?
People test and demonstrate the basis of statistics all the time.
For example here’s a toy that demonstrates the statistical properties of the Bell Curve:
Here’s discussion about testing statistical sampling:
Here’s an example where the randomness of dice were practically tested:
“Factually Accurate” is strange way to describe it.
The most important thing to take into consideration when sampling is
“is the sample a fair representation of the population you are interested in?”
What the OP describes is a very poor method of sampling, it isn’t random in any meaningful sense, it runs the risk of not being representative. As such, any conclusions you draw run a high risk of being incomplete at best and completely misguiding at worst.
Until that sampling methodology is solved I wouldn’t even start considering to what degree that sample is a good predictor of the population.
The big problem I have with the OP’s example is that the sample size is 1 (one barrel). That’s not how sampling works. You can’t just pick one item and assume that all the rest are similar.
Exactly so. Random sampling works just fine, for giving you a fairly accurate assessment of the entire population, without having to do a census (i.e., surveying every member of the population). But, in order for it to be accurate, you need two things (at least):
- A large enough sample size (typically in the hundreds to thousands)
- A sample that is representative of the population (e.g., if your population is 50% male and 50% female, your sample needs to reflect that)
Very true, and to be honest, given a choice (and sometimes in the real world we are limited in this way) if I had to have either a small sample that was representative or a huge sample that was not, I’d take the first every time.
In that instance you stand a chance of getting an accurate result that comes with some degree of uncertainly whereas in the other case you get a pretty definite answer which will almost certainly steer you in the wrong direction.
I interpreted that there was a whole tractor load of apples that were then randomly sorted into barrels, so you are not sampling a single item, you are sampling however many apples fit into a barrel. That may or may not be what is really going on, though.

Very true, and to be honest, given a choice (and sometimes in the real world we are limited in this way) if I had to have either a small sample that was representative or a huge sample that was not, I’d take the first every time.
Speaking as a professional market researcher: I absolutely agree.

The big problem I have with the OP’s example is that the sample size is 1 (one barrel). That’s not how sampling works. You can’t just pick one item and assume that all the rest are similar.
yes, I think we can all imagine reasons why one barrel might be a population all of its own, Some good examples were given upthread.

I interpreted that there was a whole tractor load of apples that were then randomly sorted into barrels, so you are not sampling a single item, you are sampling however many apples fit into a barrel. That may or may not be what is really going on, though.
If they were randomly deposited in the trailer and then randomly placed in barrels then, yes, potentially that’d be OK.
However, what if the time spent either in the trailer or the barrel meant exposure to conditions that influence apple condition. exposure to light? pressure? temperature? many other factors could be at play as well. The whole area of sampling is fascinating and a great exercise in identifying, considering and minimising the effects of variables
Suppose you have 10 barrels of apples. You randomly select one, and you find that it contains rotten apples. (To keep things simple, we’ll just focus for now on whether or not there are rotten apples, not on how many.)
This tells you absolutely nothing for sure about the other nine barrels. It’s entirely possible that they’re all good. However, what you can say for sure is that, if the other nine barrels were all good, there’s only a 10% chance that you randomly chose the one barrel that had the rotten apples. From that we can conclude that the one you chose is probably not the only one with rotten apples.
Now suppose there were 1000 barrels of apples. You randomly select 10 of them, and 3 out of those 10 contain rotten apples. What does that tell you about the other 990 barrels? Nothing for sure. But your best guess, under the circumstances, would be that probably somewhere around 30% of them contain rotten apples.
But there are some things you can work out for sure. You can find the probability that, if those 3 were the only ones in all the 1000 that had rotten apples, that your randomly-chosen sample of 3 would include all 3 of the bad ones. This probability would be very, very small. You can also calculate probabilities like: if only 10% of all the barrels had rotten apples, what’s the probability that you’d get at least 3 bad ones in your sample of 10. (You do this by mathematically counting how many different samples of 10 barrels you could select from a population of 1000, and how many of those would contain at least 3 bad barrels.)
From calculataions like this, you can be a lot more specific about quatifying “probably” and “somewhere around” in a statement like “probably somewhere around 30% of all the barrels contain rotten apples.”
Sampling is factually accurate. The field of statistics has treated this quite thoroughly. However, you need to understand what it does and doesn’t say.

Examiner chooses barrel 3 and discovers 10 rotten apples in it, and assumes that all 10 barrels will have 10 rotten apples.
First of all in your example, your examiner is ignorant of statistics. That is not at all what the examiner should assume.
Sampling never gives just a number, it gives a probability. Statistics has a very rigorous and proven method that says if you sample n items from a population of p, the occurrence of characteristic c in the sample will be found in the entire population at a rate of n/p±e where e is the error range, with a confidence of x%. We usually select a sample and error range that will give us a confidence level of 95%. Have you ever seen a political poll where they give an error? That is where that comes from.
Well, sampling one barrel does give you information, but the error is so large that it wouldn’t tell you anything useful. And that’s assuming the sample was good.

Once you do this, random sampling of a small number of apples gives you an estimate of the true distribution of rotten apples. How good this estimate is depends on how many you sample form the total number. Obviously it gets better the more you sample.
Up to a point. The number of samples you need from a population of 100 million is not much more from a population of 10 million. That’s why political polls can get away with relatively small samples. Getting a representative sample is a much bigger problem.
One of my favorite books was Sampling Theory by Cochrane which I rescued from disposal at my work library. Used it often before you could just look it up online.

Once you do this, random sampling of a small number of apples gives you an estimate of the true distribution of rotten apples. How good this estimate is depends on how many you sample form the total number. Obviously it gets better the more you sample.
One way to determine the adequacy of a sample is to randomly select a subset from your sample and run a Student’s T-Test. As you increase the size of your subset, the sample means of each subset should approach each other (and the sample mean of your original sample); if they don’t, you need a larger sample. The size necessary for significance—assuming that the characteristic or condition being measured is truly random—depends upon the incidence. If you find 10 rotten apples in a barrel of 100, then only a small sampling should be needed; if it is 10 out of 10000, the likelihood of any particular sampling catching even one rotten apple is quite low (less than a 5% chance of finding even a single rotten apple in your sample). For a condition like this that is binary it is very easy to analytically determine how much of a sample is likely to produce an accurate representation within defined statistical bounds, but for a more complicated discrete or continuous metric it can be very difficult to figure out what the appropriate sample size is and often empirical rules are used to determine sample size.
This gets to another issue specific to this example, however; rotten apples are not likely to be an unbiased sample because, as the saying goes, “One bad apple spoils the barrel” (a pithy philologism most recently being applied to corruption that often misses the actual point that a single corrupt official often stimulates corruption in others). In the case of apples, spoilage is the result of one of three causes: bruising of the fruit that releases ethylene gas causing nearby fruits to also prematurely ripen and spoil; apple maggots (‘worms’) that burrow through the apple also releasing ethylene; or fungal infestation. Obviously, any of these three in a single apple will quickly result in other nearby apples also spoiling, which means that any given barrel of apples is not a random distribution but is instead biased as bushels are likely to have either no spoiled apples or many spoiled apples. This kind of applied knowledge about the system or product being sampled is often missed by statisticians who just look at the number but is crucial to determining the adequacy of a sampling method and assumption of distribution in representing reality.

Up to a point. The number of samples you need from a population of 100 million is not much more from a population of 10 million. That’s why political polls can get away with relatively small samples. Getting a representative sample is a much bigger problem.
Sampling theory is often given superficial coverage if at all in basic statistics classes, and even when it was addressed in the ‘Six Sigma’ certification class I was compelled to take by previous employer they covered it in such a poor fashion that it wasn’t really useful. Actually determining necessary sample size is very complicated for pretty much any real world phenomenon and failing to do so is a basic error in a lot of statistical process control.
Stranger
Right, that would be poor sampling. However, if you did open 1 barrel in ten and found issues, you would be justified in opening a few more.
Now if there was 1000 barrels and 100 would found to be rotten, we could reliably conclude all were bad- making several assumptions of course. As you get larger, say 100000 and 10000 the sampling gets closer and closer to being “factually accurate”.
A real problem is determining if the portion you are sampling is truly random. The more complex interactions that the group you are sampling has resulted from, can mean you are less likely to have a really random and average portion. For the apple example. At what point in the apple supply chain are you taking your random sample? A warehouse that has 1000 barrels shipped in from various parts of the country? You now have many discrete variables that only apply to certain subgroups of barrels.
Extrapolating from random testing is more accurate, the fewer variables that can be present in the test group.
Many results of random testing have been wildly incorrect, due to the test group not actually being random to begin with.
It’s been a half century since I had any statistics classes, but from what I recall, this would be a better procedure:
Given 10 barrels of apples, take 1/10th barrel of each one and check those apples for rotten ones. You would end up checking the same number of apples as in the OP procedure where you checked all the apples in one barrel.
Am I correct that this procedure would be more likely to produce higher accuracy in your estimate? Seems like it would ensure a more random selection of apples.