I don’t know if there is a factual answer to this or not so I am putting it here.
I know this has been asked rhetorically in the past, but I don’t recall ever seeing the question posed as an actual serious question.
When does a collection of anecdotes become data?
For example, let’s say the conversation is about physical prowess in women vs men.
It is a very common thing to see folks saying that women have less strength and endurance than men, and that it’s genetic, and for that statement to be accepted as fact.
I come along and say, “no, that’s not true, I have direct experience of multiple tests over a 10 year period, and there are hundreds of thousands and possibly millions of records, all taken using the same standards and methods over a period of decades by the same four or five organizations that directly contradict that statement.”
Are my 10 years and the hundreds of thousand or possibly millions of records data or anecdotes?
When someone provides evidence that the anecdotes are true.
In your example of male/female athletic ability, there are all the historical records from contests showing that, for example, men are mostly faster, stronger and bigger than women.
For starters, when the anecdotes are systematically collected. If I just tell everyone “I think one gender is stronger than the other; come tell me your experiences”, then I might just get strong women and weak men choosing to respond to my poll, because people like disproving things that “everyone knows”. On the other hand, if I randomly select 1000 women chosen uniformly from the entire population, and similarly randomly select 1000 men, and test all of their strengths, then I’ve got data.
Anecdotes are on the bottom rung of evidence. As the saying goes, the plural of anecdotes is not data. They’re a starting point for investigation, not remotely close to anything one can base a conclusion on.
I realize this is an unwelcome fact, even on the Dope, where there have been a number of threads asking for other people’s anecdotes about something (generally a drug/supplement or medical treatment) and specifically requesting anecdotes only, not links to research studies. Question the usefulness of someone’s anecdote and you may be accused of calling them a liar (it happened to me here recently).
Anecdotes have their uses - they are commonly what we base our restaurant and movie choices on, for instance. Trusting them on important matters is riskier.
When they can be proven to be statistically encompassing and not selective.
I know what you’re getting at here - disputing the phrase “The plural of Anecdote is not data.” The reason for that is that “data” is all encompassing - it includes the positives and the negatives; “anecdotes” are, by definition, individual stories from one point of view.
No one tells the story of going to downtown Chicago and not being robbed. The stories you hear are all of being robbed. Using just the anecdotes, you would think the crime rate in The Loop is much higher than it is.
Just to be clear, the original quote was the opposite: the plural of anecdote is data. The bastardization of the quote to say the opposite comes about because there is a bias to think that “data” was collected correctly and “anecdotes” are collected haphazardly. And while this might be often true, it isn’t inherent to the definitions. You can have good anecdotes and crap data.
So to answer the OP’s question, if you’re asking about semantics, then there isn’t a difference between anecdotes and data. Anecdotes are a kind of data. If you’re asking about what makes data good, that’s a big topic that includes things like statistical significance, elimination of bias, and a bunch of other things. I learned that good data is marked by accuracy, completeness, reliability, relevance, and timeliness, but I’m sure there are other lists out there.
Well, there is that. But, even assuming your anecdotes come from a trustworthy source, I can think of at least two major problems with treating anecdotes as data.
One is sample size. Anecdotal evidence often comes in the form of a small number of cases, or even just a single case, which is too small a sample from which to effectively generalize.
But another, even bigger problem, is that the anecdotes typically do not consist of a random or unbiased sample of the population they’re purported to represent, so there may be no good reason to assume that generalizing from them is valid.
Treating anecdotes as data thus subjects you to cognitive biases such as the Availability Heuristic.
There’s also the notion that in order for it to be meaningful data it has to be plural. That isn’t necessarily so. You catch one living swimming coelacanth, at a time when they were believed to have gone extinct millions of years ago, that’s extremely meaningful data.
Anecdotes are absolutely data. They are just weak data. They are subject to all the issues previously mentioned, like collection bias, reliability of the person relating the anecdote, statistical weakness, and being the wrong sort of data to test the hypothesis (the rooster crowing before dawn doesn’t tell you whether the crowing influences the sun rising unless you have some way of making roosters crow at times when you don’t expect the sun to rise, for instance.)
But if your hypothesis is “there is no x” and someone has an anecdote of the existence of x, either that person is wrong, or your hypothesis is false. Because that anecdote is data.
What typically happens is someone states a questionable proposition, relying on anecdote(s) and personal conviction. Others note defects in the proposition and are met with “well, prove me wrong!”. It’s not the job of the person doubting “x” is true to prove a negative; it’s up to the claimant to establish the veracity of their claim. Anecdotes alone are insufficient, however hardwired we are to accept them.
Here’s a good summary of the deficiency of anecdotes as they relate to science and medicine. Note that physicians are not immune to the lure of anecdotal observations.
Ok, so from my example, my personal experience of 10 years of periodically observing women performing better than men on a standardized physical performance test is an andecdote, but is the decades of such tests performed on thousands of men and women over a period of decades data, or does it remain in the realm of anecdote because these men and women are a self selected group who chose to be in a position to be subjected to the test and may or may not represent the population in general?
I’m not trying to make some end run around an argument or anything like that, I’m trying to understand the finer points of difference between anecdote and data
@Thudlow_Boink, I had not thought of that before in that way, but availability heuristic makes sense to me.
To be data, the multiple anecdotes really just need to report on the same thing. I can’t put together some stories about eating out, bicycling, and home repair in a pile and hope to get anything useful. But what if I go through the stories and record how often people eat out, how frequently they bicycle, and how much they spend on their house? Then I might have something.
This, so much this.
Lots of people above are describing things that are data, they’re just bad data.
Self report is a long established foundation of data collection. It is not always going to produce good data, but that is an empirical questions. For example, many studies ask people how much they weigh, and measure their weight. They’ve created a dataset to test the reliability of self reported weight. Maybe we find out the reliability is pretty good, so no need to measure it. Maybe we find out it is bad, so we’d better make the effort to measure it.
Not all data needs to be representational of the population. If the researchers know their data is biased, or non-representative, then they can often take steps to manage that. Sometimes it’s even desirable. The problem is when the researchers think their data is unbiased, but it is not.
This is a problem with all datasets. Will it generalize? Does a study done on a bunch of white, middle-class, college students generalize to the whole population? Sometimes maybe yes, other times definitely no.
People are bad at drawing conclusions from their personal experience. Maybe you think women frequently perform better than men, because you’re surprised when that happens, and you ignore the majority of instances where the men perform better. Just an example of observer bias, not claiming that’s what’s actually happening.
Now, if you have a way to go back and look at the years of results, then you can draw some conclusions.
The scientific paper will say something like
On the StdPhys test, women on average score 1.2 points higher than men. That is a significant difference with p < 0.001 with an n of 14,568 women and 16,921 men. The women were are all training for triathlons, while the men were recruited from an assisted living facility.
The popular press headline will say
Women triathletes perform better than men
and then a bunch of Internet commenters will point out the bias in the article, as if the scientists who wrote it were unaware of it.
(No idea what the point of the study was comparing women athletes against men in assisted living facilities, but I’m not a co-author, so don’t blame me.)
Not to dispute your take on “what typically happens,” but I think @puzzlegal’s point is that, in the case of (logical) universals (“all” or “none” statements, like “All dodos are extinct” or “No dodos are alive”), a single piece of anecdotal evidence (e.g. “I saw a living dodo in my back yard”), if accurate, is enough to disprove the statement (or, equivalently, to prove its logical equivalent: “some dodos are not extinct” or “some dodos are alive”).
It obviously depends on the hypothesis. But i was thinking of hypotheses exactly like, “coelacanths went extinct millions of years ago”. And a single strong anecdote refutes that proposition.
If your hypothesis is “this medical treatment will help a few people”, you need a hell of a lot of anecdotes to have anything useful. But even there, anecdotes can be useful, if only to suggest areas to study.
So, details matter, is what I’m taking from your post.
My example is simplistic and coarse, lacking many details, so now I’ll add some.
While serving in the army for a period of 10 years, I observed at a rate of 3 times per year, women performing better than men on a standardized physical fitness test used throughout the military.
I know that there are decades of records archived from hundreds of thousands of service members that show women performing better than men over a period of several decades. I know that while the collection of those records may vary in minor ways on a case by case basis, over all, the collection of the information was very uniform and standard.
Are those records data, or anecdote? Details I added are these, the men and women subjected to the test were military service members, a self selected group that may or may not represent the population generally.
How many women did you test, and how many men? What was the average score of each gender? What standard deviation?
If you’ve actually been collecting data (i.e., recording all of the results), then all of these questions are very easy to answer. If you can’t answer these questions, then you haven’t been recording your results, and are just relying on your memory. And your memory probably sucks. That’s not meant to be an insult; human memories in general suck.