Recap of substantive revision on 19 February 2020

On February 16, I (Mike Famulare) received an email from Neil Ferguson who kindly pointed out a technical error in the published estimate of the overall infection-fatality-ratio (IFR). Despite the attention paid to adjusting for delays in outcomes, I neglected to adjust the estimate of the case-ascertainment rate–the fraction of all infections reported as cases–for the delays from infection onset to symptom onset and case confirmation, and thus over-estimated the total number of infections denominator and under-estimated the IFR. With this additional adjustment and using the same methodology and data from the original version of this report, the corrected estimate of the infection-fatality-ratio (IFR) is 0.94 (0.37, 2.9) percent. While the uncertainty remains large, this estimate now excludes CDC’s reference estimate for the 1957 H2N2 pandemic flu (0.1 to 0.3 percent) and is only two-fold lower than and overlapping with CDC’s reference estimate of 2.04 percent for the 1918 H1N1 pandemic flu. This estimate from early data (globally representative of what was known at the time, but mostly from Wuhan) is also now compatible with the independent estimate published by the Imperial College group on February 10. While this correction does not change the assessment that COVID-19 exhibits severe pandemic potential in the absence of effective interventions, the shift in framing brought about by this three-fold revision from ‘possibly comparable to the 1957 flu but not 1918’ to ‘possibly comparable to 1918’ may meaningfully impact risk perception.

The document below has been edited from previous versions to correct the technical issue described above. Unless noted, the results reflect the state of data and interpretation as of February 4 and are otherwise unchanged. Please see WHO Situation Report 29 for a summary of public modeling on this topic to date, and seek more recent sources for updates in response to new data–especially regarding the impacts of age and comorbities as recently published by China CDC (mirror).


In recent days, much clarity has been gained about the transmissibility of 2019-nCoV–that people may be infectious while exhibiting no or mild symptoms, that some shedders can display very high viral load, and that R0 in China prior to interventions is likely around 2.5 to 2.9. However, to fully anticipate the possible epidemiological impact of this pathogen, more clarity is needed around the confirmed-case-fatality-ratio (confirmed-CFR – fraction of confirmed cases that result in death) and infection-fatality-ratio (IFR – fraction of all infections that result in death). Early crude estimates of the confirmed-CFR based on the ratio of confirmed cases to reported deaths are hovering around 2-3 percent, but this naive calculation is incorrect because it does not account for the time required from symptom onset, case confirmation, and death. This report uses publicly-available data through January 31, 2020 and published modeling results to estimate the confirmed-CFR and IFR, adjusted for age and delays between case confirmation and death. With this information about mortality and estimates by others about transmissibility, we conclude with a preliminary assessment of the epidemiologial risk posed by 2019-nCoV relative to seasonal and pandemic influenza.

Based on publicly-curated linelist data that has thorough case data through only January 15, the early evidence is that the confirmed-CFR has strong age structure, with highest risk of death in the elderly. Averaging over all confirmed case ages, the estimated overall confirmed-CFR was approximately 33 (29, 37) percent. This estimate, which is only well grounded in data through January 15, is roughly 10 times higher than the incorrect naive calculation mentioned above. With expanded diagnostic capacity and more sensitive case definitions, it is reasonable to expect this number will trend downward over time and we will update as more data becomes available.

(Paragraph revised 18 Feb.) The confirmed-CFR only describes cases that were confirmed, publicly reported, and summarized in a manner suitable for analysis. From the confirmed case data and the most recent mathematical transmission model published in the Lancet by We et al, we estimate that only 2.9 (1.3, 8.0) percent of infections had been reported as confirmed cases through January 25. Under the assumption that most infections that have gone unreported are not severe, the analyzed evidence indicates that the likely overall infection-fatality-ratio (IFR) is roughly 9.4 per 1000 (4.0, 26), roughly comparable to the 1957 H2N2 pandemic flu more severe than the 1957 H2N2 influenza pandemic and only two-fold lower than CDC’s reference estimate of 2.04 percent for the 1918 H1N1 influenza pandemic, albeit with broad uncertainty.

From the early data, we estimated that the median time from hospitalization to death is 12.4 days, compatible with a recent publication in NEJM by Li et al using data from within China. However, from approximately Jan 13 through Jan 30, the official death count accumulated more quickly than can be explained by the confirmed case timeseries, the confirmed-CFR above, and the observed median duration to death. To reconcile the official death count with the confirmed case count, we find the most parsimonious explanation is that the median duration from hospitalization to death has dropped from 12.4 days to approximately 7 days since Jan 12. Summarizing the evidence we have so far on the progression from confirmed case to confirmed-case-fatality, a simple model to predict deaths from confirmed cases is to multiply confirmed cases by 0.33 and shift forward in time by 7 days. If correct, we expect the outbreak to reach 1000 cumulative deaths in the next week. (Added 18 Feb: The number of reported eaths excedded 1000 on Feb 10, five days after the point estimate herein.) The continued reliability of this model will depend on stable data reporting, the age distribution of confirmed cases, the effectiveness of hospital care, all of which may change over time and by location.

(Paragraph revised 18 Feb.) With combined evidence about transmissibility and severity, as viewed through CDC’s Novel Framework for Assessing Epidemiologic Effects of Influenza Epidemics and Pandemics (Reed et al 2013), evidence to date points toward 2019-nCoV having the potential to be the second-most-severe respiratory pathogen pandemic in the last century have comparable severity to the 1918 flu pandemic in the absence of effective control and treatment, when averaged across all ages.

Please note that all results summarized above are based on data and nascent transmission modeling from roughly the first month of the outbreak. Furthermore, the IFR and confirmed-CFR in cities and countries outside the current epidemic foci will depend on age-pyramids, distribution of comorbidities, access to care, and diagnostic capabilities, and transmissibility will depend on population structure, behavior, and effectiveness of interventions. All estimates and assessments are preliminary. We are providing them to guide decision making in the absence of better information, and they will continue to be revised or be superceded by the work of others as evidence warrants.

This analysis is made possible by the willingness of Chinese public health authorities to broadcast valuable information, the dedicated efforts of many volunteers to curate data in real-time, and rapid publication by experts working closely with patients and authorities. The spirit of open science around this outbreak is remarkable and is a great benefit to those preparing around the world.

Schematic relationships between all infections, confirmed cases, confirmed deaths, and time.

Figure 1 shows the schematic relationships between different measures of disease incidence and severity. Early in an outbreak, it is difficult to directly estimate the case-fatality-ratio among confirmed cases (confirmed-CFR) from cumulative case data because natural disease progression takes time and so deaths are delayed relative to cases. It is also more difficult to estimate fatality ratio among all infection (the infection-fatality-ration, IFR) because the true number of infections is not directly measurable. In this report, we use a small amount of linelist data with complete disease progression information from early in the outbreak to estimate the delays between case confirmation and death, and then the time and age-adjusted confirmed-CFR. With an estimmate of the true confirmed-CFR in hand, we look to incidence models to estimate the fraction of infections that are reported as confirmed cases, and thus estimate the IFR in the whole infected population.

Figure 1. Schematic relationships between deaths, confirmed cases, total infections, and time. Accurate estimates of the confirmed-CFR and IFR require adjust for the delays between incidence, case confirmation, and death.

Figure 1. Schematic relationships between deaths, confirmed cases, total infections, and time. Accurate estimates of the confirmed-CFR and IFR require adjust for the delays between incidence, case confirmation, and death.


Data. To characterize the case-fatality ratio among confirmed cases–the confirmed-case-fatality-ration (confirmed-CFR)–we analyzed case report data recored in the “Kudos to linelist” as of the Jan 27 update. For cases who died or recovered, we went back to the original media links and added columns to capture date of death or hospital discharge. The augmented dataset is available on github and we’ve contacted the linelist maintainer to incorporate. The dataset appears to overlap with the recent publication in the Lancet by Chen et al and NEJM by Li et al but we haven’t yet checked in detail. Of 275 cases listed, age is reported for 257. The final outcome of death is known for 39 and recovery for 13. The outcomes are missing for the remaining 223 and we assume they are right-censored in the analyses below. For recovered cases, the recovery date is defined as the date released from hospitalization.

To examine changes in the age distribution of confirmed cases after Jan 17 – when reporting rates appear to have increased – we analyzed the additional age data in the more comprehensive nCoV2019_2020_line_list_open linelist as of the morning of Jan 30, for a total of 318 cases with known age and 4984 with unknown.

Both linelists are lacking most outcome information and are lagging authoritative case reports from For the cumulative confirmed case and reported death timeseries, we used data aggregated at Wikipedia: Timeline_of_the_2019-2020_Wuhan_coronavirus_outbreak and made available on Github.

Instead of traditional citation endnotes, all sources are cross-referenced with hyperlinks embedded in the text.

Methods. The Kaplan-Meier estimator was used to characterize the distribution of times from symptom onset to death. When final outcomes are missing (death or recovered), we assume the outcomes are right-censored–not yet known and/or lost to follow-up. To estimate the case-fatality-rates by age, we used logistic regression with a generalized additive model to smooth raw observations and extrapolate to younger ages not yet included in the dataset.

Definitions. Confirmed cases: confirmed positive by PCR. Confirmed-case-fatality-ratio (confirmed-CFR): expected fraction of confirmed cases that will die. Infection-fatality-ratio (IFR): estimated fraction of all infections (confirmed and suspected and unreported) that will die.


Duration from symptom onset to death or recovery from confirmed cases.

We analyzed publicly-curated data describing the time from symptom onset to death or recovery (marked by release from hospitalization) taken from the “Kudos to linelist” as described in the Methods above. Figure 2 describes the time series of symptom onset and death, recovery, or right-censoring (unknown or as-yet-determined outcome). The two most relevant features of this data are (1) outcomes are mostly unknown for cases with onset after January 15, and (2) deaths and recoveries lag symptom onset.

Figure 2. Time series of cases analyzed. Top: fraction with each outcome by symptom onset date. Middle: case counts by symptom onset date. Bottom: time of death or recovery for cases with known outcome.

Figure 2. Time series of cases analyzed. Top: fraction with each outcome by symptom onset date. Middle: case counts by symptom onset date. Bottom: time of death or recovery for cases with known outcome.

Duration from hospitalization to death from confirmed cases prior to Jan 15

Figure 3 shows the disribution of durations from hospitalization to death for confirmed cases prior to Jan 15. We find the median time from hospitalization to death in the population studied is 12.4 days. This is consistent with results from Li et al in NEJM report that the mean time from symptom onset to hospitalization was 12.5 days in cases before Jan 1.