New York University

What Numbers Can—and Can’t—Tell Us About the Pandemic

Data scientists identify common COVID-19 statistical pitfalls
14-Jul-2020 6:20 PM EDT, by New York University

Newswise — Currently, we are confronted, around the clock, with troubling data as reporters, public health experts, and elected officials seek to understand and describe the path and impact of COVID-19—poring over rates of infection, hospital admission, and death, to name just a few key indicators. 

With so many numbers to digest, it can be challenging to separate statistics that may mislead from those that illuminate—something that has complicated the decision-making of government officials, according to recent news accounts. And while the widespread suspicion that numbers can be manipulated to support almost any conclusion predates the pandemic, partisanship around the response to the virus has further undermined Americans’ trust in COVID-19 data, according to a recent Pew Research Center survey

But statistics are, of course, vital to understanding the current crisis, as well as other complex problems such as poverty, economic downturns, and climate change, and so researchers stress the importance of learning to distinguish what’s useful from what may be junk. 

“We suspect that statistics may be wrong, that people who use statistics may be ‘lying’—trying to manipulate us by using numbers to somehow distort the truth,” writes sociologist Joel Best in his book Damned Lies and Statistics. But, he explains, “[t]he solution to the problem of bad statistics is not to ignore all statistics, or to assume that every number is false. Some statistics are bad, but others are pretty good, and we need statistics—good statistics—to talk sensibly about social problems.” 

To help enhance our own statistical literacy as the pandemic continues, NYU News spoke with Andrew Gordon Wilson and Jonathan Niles-Weed, assistant professors at NYU’s Center for Data Science and Courant Institute of Mathematical Sciences, who outlined some principles to keep in mind when evaluating figures cited in the news. 

Their tips appear below, but both caution that training in data science alone isn’t enough to equip leaders to make perfect decisions.

“Many people—statisticians included—think that every problem can be solved by getting better data,” says Niles-Weed. “But even with perfect information, beating COVID will require politicians and public health experts to weigh very different considerations and make hard choices despite uncertainty. Data can help, but setting good policy also requires incorporating values and goals.” 

Be certain about the uncertainty in the data.

“Many of the facts and figures we see come with big unstated error bars,” warns Wilson. “Suppose the only person in a village tested for coronavirus tests positive. It could be reported that the incidence rate in that region is 100%. You might say, ‘Surely they need to test more people?’ But how many people should we test for an accurate incidence estimate? Ten people, 100 people, 10,000 people? What’s a reasonable sample size? And do we only test symptomatic people? What fraction of the population is asymptomatic? What constitutes ‘accurate’? Similarly, models predicting quantities such as incidence rate take many variables as input, such as case fatality rate. These inputs similarly have big uncertainty attached to them. We should be conscious of uncertainty in parsing numbers we see in the media—the point predictions, without reasonable estimates of the error bars, are often meaningless.” 

Separate real trends from random occurrences

“Random variation in data can easily be mistaken for a genuine trend,” says Niles-Weed. “Even if the underlying situation is static, data may change from day to day because of random noise. For example, if a state’s newly confirmed cases are particularly high during a given week and lower the next, it’s easy to interpret this as meaningful: perhaps the high caseload in one week made citizens more cautious, leading to a drop in cases the next week after behaviors changed. But it's just as likely that the first week was just a random outlier, and that nothing at all changed. By contrast, sustained day-over-day increases or decreases can indicate real trends.” 

Know what different probabilities can tell you—and what they can't. 

“It’s easy to confuse conditional probabilities, which is significant during a pandemic because it can lead to a misreading of testing data,” notes Wilson. “For example, in taking a test for coronavirus, we care about the probability that we have coronavirus given that we test positive—and not the probability that we test positive given that we have coronavirus."

We have to carefully interpret what a probability is telling us. For example, the sensitivity of a test tells us the probability that we test positive, given that we have the condition. Similarly, another measure—the specificity—is the probability of a negative result if we don't have the condition. If a test has a high sensitivity, and is thus reported as highly accurate, it does not mean testing positive means we are likely to have coronavirus, especially if the general rate of coronavirus in the population is low. Similarly, if the general rate of coronavirus is high, a negative test result may have high probability of being a false negative, even when the test has high specificity.” 

Is your sample biased?

“While a truly random sample can give precise information about the whole population, bias can arise if some people are more likely to be included than others,” explains Niles-Weed. “For example, if a research team performs antibody tests on a random set of people walking down a city street, they will invariably miss those too sick to leave their beds. Data collected in this way can fail to be representative when extended to the whole population.” 

What information is missing?          

“Many claims are factually correct but misleading due to crucial missing information,” says Wilson. “For example, it may be correct to say a majority of confirmed cases in a region are Asian, but if only a very small number of people had tested positive, that may not be a meaningful finding. Similarly, there are many correlations that can easily be explained away by missing causal factors. It was reported at one time that healthcare workers in New York have a slightly lower incidence of coronavirus than the general population. Does that mean social distancing is ineffective, since these workers will be more exposed to infected people? If we condition on the fact that healthcare workers are trained to be vigilant in mask wearing, hand-washing, distancing, and sanitization, it likely means the exact opposite!”


Register for reporter access to contact details

Damned Lies and Statistics

Filters close

Showing results

110 of 2927
Released: 14-Aug-2020 4:55 PM EDT
Managing your child’s diabetes during COVID-19
University of Texas Health Science Center at Houston

These days it’s hard not to worry about whether a quick outing to the grocery store will result in catching COVID-19. But for parents with children who have preexisting health conditions such as diabetes, it can be especially hard not to worry about whether their child is at a higher risk of becoming severely ill from the virus.

Newswise: 1200x800?cb=1597350935
Released: 14-Aug-2020 3:35 PM EDT
Gaiters do no harm: WVU toxicologists find coverings help contain the spread of exhaled droplets
West Virginia University

Experts with the West Virginia University Center for Inhalation Toxicology found that – assuming it’s a good fit - a gaiter will, despite recent reports, provide a respiratory containment of exhaled droplets comparable to a common over-the-ear cloth mask.

Newswise: AI software enables real-time 3D printing quality assessment
Released: 14-Aug-2020 3:05 PM EDT
AI software enables real-time 3D printing quality assessment
Oak Ridge National Laboratory

Oak Ridge National Laboratory researchers have developed artificial intelligence software for powder bed 3D printers that assesses the quality of parts in real time, without the need for expensive characterization equipment.

Newswise: Is the COVID-19 virus pathogenic because it depletes specific host microRNAs?
Released: 14-Aug-2020 3:05 PM EDT
Is the COVID-19 virus pathogenic because it depletes specific host microRNAs?
University of Alabama at Birmingham

Why is the COVID-19 virus deadly, compared to cold-causing coronaviruses? Analysis current literature and bioinformatic study of seven coronaviruses, suggests that SARS-CoV-2 acts as a microRNA “sponge,” leading to better viral replication and blockage of the host immune response.

Released: 14-Aug-2020 2:30 PM EDT
UW team developing model to help lower COVID-19 infections in Seattle, other major cities
University of Washington

A UW team has received a grant to develop a model that uses local data to generate policy recommendations that could help lower COVID-19 infections in King County, which includes Seattle.

Newswise: Cardiovascular risk factors tied to COVID-19 complications and death
12-Aug-2020 7:05 PM EDT
Cardiovascular risk factors tied to COVID-19 complications and death

COVID-19 patients with cardiovascular comorbidities or risk factors are more likely to develop cardiovascular complications while hospitalized, and more likely to die from COVID-19 infection, according to a new study published August 14, 2020 in the open-access journal PLOS ONE by Jolanda Sabatino of Universita degli Studi Magna Graecia di Catanzaro, Italy, and colleagues.

Newswise: Study shows frequently used serology test may not detect antibodies that could confirm protection against reinfection of COVID-19
Released: 14-Aug-2020 1:55 PM EDT
Study shows frequently used serology test may not detect antibodies that could confirm protection against reinfection of COVID-19
University of Texas M. D. Anderson Cancer Center

Two different types of detectable antibody responses in SARS-CoV-2 (COVID-19) tell very different stories and may indicate ways to enhance public health efforts against the disease, according to researchers at The University of Texas MD Anderson Cancer Center. Antibodies to the SARS-CoV-2 spike protein receptor binding domain (S-RBD) are speculated to neutralize virus infection, while the SARS-CoV-2 nucleocapsid protein (N-protein) antibody may often only indicate exposure to the virus, not protections against reinfection.

Released: 14-Aug-2020 1:50 PM EDT
USC scientists identify the order of COVID-19's symptoms
University of Southern California (USC)

USC researchers have found the likely order in which COVID-19 symptoms first appear: fever, cough, muscle pain, and then nausea, and/or vomiting, and diarrhea.

Released: 14-Aug-2020 1:45 PM EDT
Stay the Course with Personal Finances during Pandemic, Johns Hopkins Expert Advises
Johns Hopkins University Carey Business School

Keeping on a careful and steady path is the wisest approach to personal money management during the uncertainties of the COVID-19 crisis, says Associate Professor Yuval Bar-Or of the Johns Hopkins Carey Business School.

access_time Embargo lifts in 2 days
Embargo will expire: 17-Aug-2020 11:00 AM EDT Released to reporters: 14-Aug-2020 1:25 PM EDT

A reporter's PressPass is required to access this story until the embargo expires on 17-Aug-2020 11:00 AM EDT The Newswise PressPass gives verified journalists access to embargoed stories. Please log in to complete a presspass application. If you have not yet registered, please Register. When you fill out the registration form, please identify yourself as a reporter in order to advance to the presspass application form.

Showing results

110 of 2927