Overview

Our group at UC Berkeley is working to help forecast the severity of the epidemic both for individual counties and individual hospitals. As a byproduct, we have and will continue to produce models, visualizations, and curated datasets (including confirmed cases/deaths, demographics, risk factors, social distancing data) that can be used by other teams in the fight against COVID-19. We are collaborating with Response4Life, a non-profit organization, whose goal is to blunt the effect of COVID-19 through the production and appropriate distribution of PPE, medical equipment, and medical personnel to healthcare facilities across the United States.

Visualizations (updated daily): see the project website
Data (update daily): We have compiled and cleaned a large corpus of hospital- and county-level data from a variety of public sources to aid data science efforts to combat COVID-19.
- At the hospital level, the data include the location of the hospital, the number of ICU beds, the total number of employees, the hospital type, and contact information
- At the county level, our data include socioeconomic factors, social distancing scores, and COVID-19 cases/deaths from USA Facts and NYT
Modeling: Using this data, we have developed a short-term (3-5 days) forecasting model for mortality at the county level. This model combines a county-specific exponential growth model and a shared exponential growth model through a weighted average, where the weights depend on past prediction accuracy.
Severity index: The Covid pandemic severity index (CPSI) is designed to help aid the distribution of medical resources to hospitals. It takes on three values (3: High, 2: Medium, 1: Low), indicating the severity of the covid-19 outbreak for a hospital on a certain day. It is calculated in three steps.
1. county-level predictions for number of deaths are modeled
2. county-level predictions are allocated to hospitals within counties proportional the their total number of employees
3. final value is decided by thresholding the number of cumulative predicted deaths for a hospital (=current recorded deaths + predicted future deaths)

Quickstart with the data + models

Data

download the processed data (as a pickled dataframe df_county_level_cached.pkl) from this folder and place into the data directory
Can now load/merge the data:

import load_data
df = load_data.load_county_level(data_dir='/path/to/data')
print(df.shape)

for more data details, see ./data/readme.md
note: (non-cumulative) daily cases + deaths are in data/usafacts/confirmed_cases.csv and data/usafacts/deaths.csv (updated daily)
note: abridged csv with county-level info such as demographics, hospital information, risk factors, social distancing, and voting data is at data/df_county_level_abridged_cached.csv
we are constantly monitoring and adding new data sources
- we are keeping track of relevant data news here
output from running the daily tests is stored here

Prediction

To get deaths predictions for our current best-performing model, the simplest way is to call (for more details, see ./modeling/readme.md)

from modeling.fit_and_predict import add_preds
df = add_preds(df, NUM_DAYS_LIST=[1, 3, 5]) # adds keys like "Predicted Deaths 1-day", "Predicted Deaths 3-day"
# NUM_DAYS_LIST is list of number of days in the future to predict

Related county-level projects

Acknowledgements

The UC Berkeley Departments of Statistics, EECS led by Professor Bin Yu (group members are all alphabetical by last name)

Yu group team (Data/modeling): Nick Altieri, Rebecca Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robbie Netzorg, Briton Park, Chandan Singh (student lead), Yan Shuo Tan, Tiffany Tang, Yu Wang
the response4Life team and volunteers (Organization/distribution)
Kolak group team (Geospatial visualization): Qinyun Lin
Medical team (Advice from a medical perspective): Roger Chaufournier, Aaron Kornblith, David Jaffe
Shen Group team (IEOR): Junyu Cao, Shunan Jiang, Pelagie Elimbi Moudio
Helpful input from many including: SriSatish Ambati, Rob Crockett, Marty Elisco, Valerie Karplus, Andreas Lange, Samuel Scarpino, Suzanne Tamang, Tarek Zohdi

Name		Name	Last commit message	Last commit date
Latest commit History 688 Commits
_includes		_includes
_layouts		_layouts
assets		assets
data		data
data_hospital_level		data_hospital_level
data_new		data_new
eda		eda
functions		functions
modeling		modeling
predictions		predictions
results		results
viz		viz
.gitignore		.gitignore
LICENSE		LICENSE
_config.yml		_config.yml
county_quickstart.ipynb		county_quickstart.ipynb
covid19.ipynb		covid19.ipynb
fields.txt		fields.txt
hospital_quickstart.ipynb		hospital_quickstart.ipynb
index.html		index.html
load_data.py		load_data.py
readme.md		readme.md

License

zheng-da/covid19-severity-prediction

Folders and files

Latest commit

History

Repository files navigation

Overview

Quickstart with the data + models

Data

Prediction

Related county-level projects

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages