Skip to content

A app to intelligently search through COVID-19 Open Research Dataset (CORD-19) and find similar papers

License

Notifications You must be signed in to change notification settings

laranea/Covid19

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Covid19

A app to intelligently search through COVID-19 Open Research Dataset (CORD-19)link and find similar papers powered with Machine Learning and NLP. All with a sleek UI.

Note: Project Under Development

Download Resources

Download the model file cord19-300d.magnitude (644.95 MB) form here and after unzipping keep it in dir resources in the root directory. Note: You will need to login into Kaggle.

For now we will only use the Metadata File from Semantic Scholar. A sample_metadata.csv with only 1000 doc is added for app readiness. Please update the file by running the app and clicking Update Data

DIR structure should look like

.
├── LICENSE
├── README.md
├── data
│   ├── sample_metadata.csv
│   ├── metadata.csv (will be created on first upate run from the app)
|   ├── metadata_processed.pickle (will be created on first upate run from the app)
├── resources
│   ├── cord19-300d.magnitude (You need to download this file, keep the name same or change in config.cfg)
├── src
│   ├── cord19_app.py
│   ├── config.cfg
│   ├── data_io.py
│   ├── embedding.py
│   ├── tokenizer.py
│   ├── utils.py
│   └── vectorizer.py

... (excluded others)

Running the Streamlit App

cd src
streamlit run cord19_app.py

Note: On first run Click on Update Data button on sidebar of the app, it will take around 5-7 min, wait for it. Then you can run query. This is only required first time and next when you want to update new data from the source.

Landing QueryResponse QueryResponse

About

A app to intelligently search through COVID-19 Open Research Dataset (CORD-19) and find similar papers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%