I don't understand the numbers/charts/graphs of COVID-19

Do any of these graphs and maps showing the spread of COVID-19 make sense to anyone else? I can’t tell if I just don’t understand them because I probably have undiagnosed dyscalculia (difficulty learning and understanding math) or because deaths and cases are being imperfectly reported and we just don’t have enough information yet and won’t know what’s really going on until months (and months?) from now.

People keep saying oh it’s getting better in NY and other hot spots, the curve is flattening, yada yada. It’s not as bad in Southern states because the population is less dense, the weather is warming, etc etc whatever, and it’s okay to start loosening restrictions.

I think some of these politicians have their own ulterior motives, which aren’t supposed to be discussed in this forum, but I can’t even tell if the numbers are actually as they claim or not.

Does anybody know if the curve is actually flattening or going down (US as a whole versus particular states that want to loosen) or if it’s too soon to tell? (Of course there could be a second wave but we haven’t gotten to that point yet AIUI). Is there any website that shows the graphs and explains things for math duds like me?

Antibody tests in NY state found that urban areas had a 10-20% infection rate, meaning the virus has already infected 10-20% of people.

But the rest of the state was only about a 3-4% infection rate. So still a lot of room for the virus to grow and explode.

This site might help with the data, Skypist, although it doesn’t really have much text explanation: An interactive visualization of COVID-19 | 91-DIVOC

I’m not clear from your post whether your graph problem is with:

  • whether the underlying information on cases and deaths per day, and trends is reliable - always a fair question, as the graph is only as good as the info it shows. Other threads have noted that different countries were counting data in different, non-comparable ways, so showing them on graphs that compare counts is a bit deceptive. Remember both China and NY had to do big adjustments to their death figures when they realised that they were carefully counting hospital deaths but missing out on hoe and nursing home deaths. Some countries test like crazy and they will have higher case counts than those who restrict testing. I do not know if US state authorities all count the same way.

  • understanding the graphs - Yes this can be hard, and assumes a whole bunch of visual and cognitive shortcuts that may not be available to a reader. 91-DIVOC mentioned by ENugent above is very good for at least playing with how the graphs can be enhanced to highlight different things like picking out specific countries, or changing scale. A few things to note if this is your issue:

  • starting point - 91-DIVOC shows graphs starting when the country’s cases reach 100 to ensure a bit of consistency. It also excludes the tiny countries like Andorra and Vatican City which have apparent high case loads.

  • log / normal scale - pay attention to the Y [left vertical]axis. Sometimes the intervals go up in even intervals, sometimes each interval is 10X the previous. This changes the apparent shape of the curve. Its mainly a representational thing but the numbers feeding the graph do not change.

  • where is the line you are interested in on the page - higher up - the more cases/ deaths, lower fewer; if the curve is sloping [USA] the steeper the slope more cases/deaths per day, flatter [eg China] the less new deaths. Further to the right the last data point is for a country, the longer it has been since the start of its count at reaching 100 cases.

  • interpreting the data - While graphs are supposed to crystalise the data so there is little left for different readers to disagree about, this is still open. ‘Flattening the curve’ is a nice visual metaphor to explain what is wanted, and you can see it in comparison charts where some are doing much better than others. There will be lag between real-time and graphs, but that is not a big component. The big omissions are probably deaths by complications of infection, which are masked at an individual level where Uncle Fred, who has been sick for years, is pushed over the edge. Unless he is tested and probably autopsied the role of the virus in his death is unlikely to be counted. The only way to measure these is after the event when excess deaths can be calculated by comparing total mortality each month to what it was like in the past few years.

All of those things really. I can’t really tell if a chart is valid or the info is skewed by various factors or whether the info shows what is being claimed.

One thing that throws me off is that the bars will show a progressively higher number of cases (as opposed to deaths) per day, but not by a lot, just a small increase for several days in a row, but this is proclaimed as the curve being flattened and everyone can go to the beach now or whatever. But cases are still increasing, isn’t that still not a good sign? So maybe I’m just not able to interpret the info correctly.

I have no way to tell if apples and oranges are being compared or what.

Lets say a person goes is sick on the 1rst. They go into see the doctor on the 5th and they have all the symptoms so they are scheduled for a test on the 10th. The results come back on the 15th and they have it.

So what date would they be reported as being diagnosed? As tested? On the 5th, 10th, or 15th?

There are too many numbers. More numbers than I wish to try to wrap my mind around. I just keep track of daily virus deaths in the US. It seems a simple metric of “how bad is it?”

The total number of cases is always going to keep going up. Most of these graphs aren’t measuring the number of people who are currently sick, but the number of people who have ever been confirmed to have covid-19.

“Flattening the curve” is all about how fast the numbers go up. If the country has 500 new cases a day, that’s not too bad, the hospitals can easily treat that number of people. If the country has 500,000 new cases a day, the hospitals would be overwhelmed.

A graph makes it easier to see whether the numbers are going up faster or slower than they used to. We want the right-most part to be as flat/horizontal as possible. If it’s flatter on the right than the middle, we’re in good shape. If it’s steeper on the right, then we’re not.

If course what we really want to know is when can we go back to normal again. How flat does it have to be before we can leave the house? Unfortunately, we can’t tell that from a chart. The charts measure how much it’s spreading right now, under a lockdown. If the curve is flat because of the lockdown, then unless we have another way to keep the spread down, ending the lockdown will make the curve go steep again.

So when someone tells you that we’re ready to reopen, don’t ask whether the curve is flat enough, ask what the plan is to keep it from unflattening. The options I’m aware of include mass vaccination (we’re nowhere near having this available), enough testing and tracking that we can reliably quarantine everyone who actually has it (we’re nowhere near this either, though a few countries have managed it), or intentionally spreading the disease enough to develop herd immunity and accepting the deaths this will cause as the cost of reopening the economy.

In the anti-body tests I saw reported in articles recently, they were NOT taking samples randomly for the test. They were testing people who were more likely to have had the virus. You can’t extrapolate those numbers to the general population.

Those tests have also been found to be completely meaningless, given the false positive error rate associated with them.

And some of the graphs are just plain weird.

That depends on the numbers. If a test has a 5% false positive rate, but the test results are coming back 20% positive, there must be an awful lot of real positives in there. You still can’t say much about any individual with a positive result, but you can definitely say things about the population.

Coronavirus Antibody Tests: Can You Trust the Results?

What do you consider skewed? If you look at the plot of total number of COVID-19 cases, I don’t think the number is skewed. But this is just the number of people who tested positive for COVID-19. If you consider the number to be a proxy for the actual number of people infected by COVID-19, then it absolutely is skewed. Because only a small number of people are getting tested, and some areas (states / countries) are testing more people than others.

Initially, a disease spreads exponentially. That means the rate of increase is increasing - i.e. every day, there are more new cases than the day before. That’s bad.

So if we can get that under control, so that the rate of increase is at least constant - i.e. every day, there aren’t more new cases than the day before - that is progress.

The next step would be to start reducing that number - i.e. every day, there are fewer new cases than the day before. But even at that stage, the disease is still spreading. The pandemic isn’t over until that number goes to zero.

A lot of the people proclaiming the curve has flattened and we can go to the beach now are talking out of their posteriors.

What I meant by “skewed” was comparing the US to Italy (Italy’s population leaning older) or NYC to Montana, stuff like that where it’s probably not an even comparison.

Of course there are many differences between different regions, some (or none or all) of which may affect how fast COVID-19 spreads. Demographics, rate of pre-existing medical conditions, when the disease first started spreading in that area, number of people tested, criteria for choosing who gets tested, when schools were closed, when stay-at-home order was issued, number of hospital beds per capita, whether N95 masks are available for health care workers, whether masks were recommended for everyone, population density, weather, percentage of people who use public transport, BCG vaccination rate, percentage of people who shake hands vs. bow, percentage of people who wear shoes indoors, etc.

So are you saying it’s only useful to compare areas that are identical in every respect?

This site is pretty good

The graphs are measuring a couple of different things.
Total cases or total deaths are cumulative. They would be displayed as a rapidly increasing “ski slope” and then gradually flatten out as the rates decrease.

When switched to “logarithmic”, each tick on the y axis represents a 10x increase in the previous tick. i.e. 1,10,100,1000, etc. This is useful for displaying exponentially increasing data. For total cases or total deaths, they will appear as a straight line until the curve “flattens” at which point it will curve downward until it approaches some final number.

Case or deaths per day would typically be similar or close to a “normal distribution” (looks like a hill). Increasing to some maximum rate, then decreasing on the reverse slope. When they say “flatten the curve”, the idea is to reduce these daily rates so that they don’t exceed the local hospital capacity (represented as a horizontal line).

No, I’m not, but I’ve seen other people saying this, or at least some differences are too different to make a valid comparison, and I don’t know whether to take it seriously or not. I don’t have that statistical background to know myself and am having to rely on others’ interpretations, I guess is what I’m saying.

A recent article by Cassie Kozyrkov discusses the NYC numbers.

(Not especially criticizing the study, because it is much less feasible to do a study of people who have been barricading themselves in their apartments since March, but it puts some colour on the interpretation of the results).

It’s not the comparison that’s valid or invalid. You have to look at what conclusion is being drawn from the comparison, and ask whether that conclusion is valid or not.