Data scrapers¶
These scripts pull data from various sources for use in Covasim. To run all scrapers, simply type
./run_scrapers
1. Corona Data Scraper¶
To quote the Corona Data Scraper web page,
Corona Data Scraper pulls COVID-19 Coronavirus case data from verified sources.
These are scraped by the loader below, and placed in the
data/epi_data/corona-data-scraper-project
directory. The data is in
CSV format.
Here is a sample of the data.
key |
population |
aggregate |
cum_positives |
cum_death |
cum_recovered |
cum_active |
cum_tests |
cum_hospitalized |
cum_discharged |
date |
day |
positives |
death |
tests |
hospitalized |
discharged |
recovered |
active |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
57089 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-21 |
0 |
1.0 |
||||||||||||
57090 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-22 |
1 |
0.0 |
||||||||||||
57091 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-23 |
2 |
0.0 |
||||||||||||
57092 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-24 |
3 |
0.0 |
||||||||||||
57093 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-25 |
4 |
0.0 |
||||||||||||
57094 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-26 |
5 |
0.0 |
||||||||||||
57095 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-27 |
6 |
0.0 |
||||||||||||
57096 |
Roane County, Tennessee, United States |
53382.0 |
county |
1.0 |
2020-03-28 |
7 |
0.0 |
||||||||||||
57097 |
Roane County, Tennessee, United States |
53382.0 |
county |
2.0 |
2020-03-29 |
8 |
1.0 |
||||||||||||
57098 |
Roane County, Tennessee, United States |
53382.0 |
county |
2.0 |
2020-03-30 |
9 |
0.0 |
||||||||||||
57099 |
Roane County, Tennessee, United States |
53382.0 |
county |
2.0 |
88.0 |
2020-03-31 |
10 |
0.0 |
88.0 |
||||||||||
57100 |
Roane County, Tennessee, United States |
53382.0 |
county |
2.0 |
91.0 |
2020-04-01 |
11 |
0.0 |
3.0 |
||||||||||
57101 |
Roane County, Tennessee, United States |
53382.0 |
county |
3.0 |
131.0 |
2020-04-02 |
12 |
1.0 |
40.0 |
||||||||||
57102 |
Roane County, Tennessee, United States |
53382.0 |
county |
3.0 |
150.0 |
2020-04-03 |
13 |
0.0 |
19.0 |
Updating¶
To update the Corona Data Scraper data,
python data/load_corona_data_scraper_data.py
As of April 4, 2020, there are apparently 3874 data sets.
2. European Centre for Disease Prevention and Control¶
To quote the European Centre for Disease Prevention and Control web page,
Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has been collecting the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process is carried out on a daily basis. To insure the accuracy and reliability of the data, this process is being constantly refined. This helps to monitor and interpret the dynamics of the COVID-19 pandemic not only in the European Union (EU), the European Economic Area (EEA), but also worldwide.
The data is stored in CSV format in
data/epi_data/european-centre-for-disease-prevention-and-control
Here is a sample of the data:
day |
new_positives |
new_death |
key |
population |
date |
|
---|---|---|---|---|---|---|
3960 |
0 |
2 |
0 |
Greenland |
56025.0 |
2020-03-20 |
3959 |
1 |
0 |
0 |
Greenland |
56025.0 |
2020-03-21 |
3958 |
2 |
0 |
0 |
Greenland |
56025.0 |
2020-03-22 |
3957 |
3 |
0 |
0 |
Greenland |
56025.0 |
2020-03-23 |
3956 |
4 |
2 |
0 |
Greenland |
56025.0 |
2020-03-24 |
3955 |
5 |
0 |
0 |
Greenland |
56025.0 |
2020-03-25 |
3954 |
6 |
1 |
0 |
Greenland |
56025.0 |
2020-03-26 |
3953 |
7 |
1 |
0 |
Greenland |
56025.0 |
2020-03-27 |
3952 |
8 |
3 |
0 |
Greenland |
56025.0 |
2020-03-28 |
3951 |
9 |
1 |
0 |
Greenland |
56025.0 |
2020-03-29 |
3950 |
10 |
0 |
0 |
Greenland |
56025.0 |
2020-03-30 |
3949 |
11 |
0 |
0 |
Greenland |
56025.0 |
2020-03-31 |
3948 |
12 |
0 |
0 |
Greenland |
56025.0 |
2020-04-01 |
3947 |
13 |
0 |
0 |
Greenland |
56025.0 |
2020-04-02 |
Updating¶
To update the Corona Data Scraper data,
python data/load_ecdp_data.py
This adds data from 10,538 countries and territories (as of April 15, 2020), including Africa, Asia, the Americas, Europe, and Oceania. More details at: https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
3. The COVID Tracking Project¶
The COVID Tracking Project “obtains, organizes, and publishes high-quality data required to understand and respond to the COVID-19 outbreak in the United States.” The project website is https://covidtracking.com
We transform this data for use in the Covasim parameter format. It is
stored in CSV-format in the ata/epi_data/covid-tracking-project
directory.
date |
key |
cum_hospitalized |
cum_in_icu |
cum_on_ventilator |
death |
new_death |
new_hospitalized |
new_negatives |
new_positives |
new_tests |
day |
num_icu |
num_on_ventilator |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2210 |
2020-03-04 |
NY |
0 |
|||||||||||
2191 |
2020-03-05 |
NY |
0.0 |
0.0 |
28.0 |
16.0 |
44.0 |
1 |
||||||
2163 |
2020-03-06 |
NY |
0.0 |
0.0 |
16.0 |
11.0 |
27.0 |
2 |
||||||
2122 |
2020-03-07 |
NY |
0.0 |
0.0 |
0.0 |
43.0 |
43.0 |
3 |
||||||
2071 |
2020-03-08 |
NY |
0.0 |
0.0 |
0.0 |
29.0 |
29.0 |
4 |
||||||
2020 |
2020-03-09 |
NY |
0.0 |
0.0 |
0.0 |
37.0 |
37.0 |
5 |
||||||
1969 |
2020-03-10 |
NY |
0.0 |
0.0 |
0.0 |
31.0 |
31.0 |
6 |
||||||
1918 |
2020-03-11 |
NY |
0.0 |
0.0 |
0.0 |
43.0 |
43.0 |
7 |
||||||
1867 |
2020-03-12 |
NY |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
8 |
||||||
1816 |
2020-03-13 |
NY |
0.0 |
0.0 |
2687.0 |
205.0 |
2892.0 |
9 |
||||||
1765 |
2020-03-14 |
NY |
0.0 |
0.0 |
0.0 |
103.0 |
103.0 |
10 |
||||||
1714 |
2020-03-15 |
NY |
3.0 |
3.0 |
0.0 |
1764.0 |
205.0 |
1969.0 |
11 |
|||||
1661 |
2020-03-16 |
NY |
7.0 |
4.0 |
0.0 |
0.0 |
221.0 |
221.0 |
12 |
|||||
1605 |
2020-03-17 |
NY |
7.0 |
0.0 |
0.0 |
963.0 |
750.0 |
1713.0 |
13 |
Updating¶
To update the COVID Tracking Project data,
python data/load_covid_tracking_project_data.py