Data scrapers

These scripts pull data from various sources for use in Covasim. To run all scrapers, simply type

./run_scrapers

1. Corona Data Scraper

To quote the Corona Data Scraper web page,

Corona Data Scraper pulls COVID-19 Coronavirus case data from verified sources.

These are scraped by the loader below, and placed in the data/epi_data/corona-data-scraper-project directory. The data is in CSV format.

Here is a sample of the data.

key

population

aggregate

cum_positives

cum_death

cum_recovered

cum_active

cum_tests

cum_hospitalized

cum_discharged

date

day

positives

death

tests

hospitalized

discharged

recovered

active

57089

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-21

0

1.0

57090

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-22

1

0.0

57091

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-23

2

0.0

57092

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-24

3

0.0

57093

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-25

4

0.0

57094

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-26

5

0.0

57095

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-27

6

0.0

57096

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-28

7

0.0

57097

Roane County, Tennessee, United States

53382.0

county

2.0

2020-03-29

8

1.0

57098

Roane County, Tennessee, United States

53382.0

county

2.0

2020-03-30

9

0.0

57099

Roane County, Tennessee, United States

53382.0

county

2.0

88.0

2020-03-31

10

0.0

88.0

57100

Roane County, Tennessee, United States

53382.0

county

2.0

91.0

2020-04-01

11

0.0

3.0

57101

Roane County, Tennessee, United States

53382.0

county

3.0

131.0

2020-04-02

12

1.0

40.0

57102

Roane County, Tennessee, United States

53382.0

county

3.0

150.0

2020-04-03

13

0.0

19.0

Updating

To update the Corona Data Scraper data,

python data/load_corona_data_scraper_data.py

As of April 4, 2020, there are apparently 3874 data sets.

2. European Centre for Disease Prevention and Control

To quote the European Centre for Disease Prevention and Control web page,

Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has been collecting the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process is carried out on a daily basis. To insure the accuracy and reliability of the data, this process is being constantly refined. This helps to monitor and interpret the dynamics of the COVID-19 pandemic not only in the European Union (EU), the European Economic Area (EEA), but also worldwide.

The data is stored in CSV format in data/epi_data/european-centre-for-disease-prevention-and-control

Here is a sample of the data:

day

new_positives

new_death

key

population

date

3960

0

2

0

Greenland

56025.0

2020-03-20

3959

1

0

0

Greenland

56025.0

2020-03-21

3958

2

0

0

Greenland

56025.0

2020-03-22

3957

3

0

0

Greenland

56025.0

2020-03-23

3956

4

2

0

Greenland

56025.0

2020-03-24

3955

5

0

0

Greenland

56025.0

2020-03-25

3954

6

1

0

Greenland

56025.0

2020-03-26

3953

7

1

0

Greenland

56025.0

2020-03-27

3952

8

3

0

Greenland

56025.0

2020-03-28

3951

9

1

0

Greenland

56025.0

2020-03-29

3950

10

0

0

Greenland

56025.0

2020-03-30

3949

11

0

0

Greenland

56025.0

2020-03-31

3948

12

0

0

Greenland

56025.0

2020-04-01

3947

13

0

0

Greenland

56025.0

2020-04-02

Updating

To update the Corona Data Scraper data,

python data/load_ecdp_data.py

This adds data from 10,538 countries and territories (as of April 15, 2020), including Africa, Asia, the Americas, Europe, and Oceania. More details at: https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases

3. The COVID Tracking Project

The COVID Tracking Project “obtains, organizes, and publishes high-quality data required to understand and respond to the COVID-19 outbreak in the United States.” The project website is https://covidtracking.com

We transform this data for use in the Covasim parameter format. It is stored in CSV-format in the ata/epi_data/covid-tracking-project directory.

date

key

cum_hospitalized

cum_in_icu

cum_on_ventilator

death

new_death

new_hospitalized

new_negatives

new_positives

new_tests

day

num_icu

num_on_ventilator

2210

2020-03-04

NY

0

2191

2020-03-05

NY

0.0

0.0

28.0

16.0

44.0

1

2163

2020-03-06

NY

0.0

0.0

16.0

11.0

27.0

2

2122

2020-03-07

NY

0.0

0.0

0.0

43.0

43.0

3

2071

2020-03-08

NY

0.0

0.0

0.0

29.0

29.0

4

2020

2020-03-09

NY

0.0

0.0

0.0

37.0

37.0

5

1969

2020-03-10

NY

0.0

0.0

0.0

31.0

31.0

6

1918

2020-03-11

NY

0.0

0.0

0.0

43.0

43.0

7

1867

2020-03-12

NY

0.0

0.0

0.0

0.0

0.0

8

1816

2020-03-13

NY

0.0

0.0

2687.0

205.0

2892.0

9

1765

2020-03-14

NY

0.0

0.0

0.0

103.0

103.0

10

1714

2020-03-15

NY

3.0

3.0

0.0

1764.0

205.0

1969.0

11

1661

2020-03-16

NY

7.0

4.0

0.0

0.0

221.0

221.0

12

1605

2020-03-17

NY

7.0

0.0

0.0

963.0

750.0

1713.0

13

Updating

To update the COVID Tracking Project data,

python data/load_covid_tracking_project_data.py

4. Demographic data scraper

To scrape demographic data, run

python data/load_demographic_data.py