CSE6250_project_team2

Natural Language Processing on MIMIC III dataset - CSE6250 Class project, Spring 2019, Team 2. Authors: Neal Cheng, Joshua Sherfey, Oren Tevet, Kevin Zhu

Environment setup:

Python version python 3.6.5

Instructions: data files and saved model can be found on google drive: https://drive.google.com/file/d/1OI7W6SV_RgsSSvAtoB2M8O2-7rGvXKvM/view?usp=sharing
ULMFit saved model can be download from google drive: https://drive.google.com/file/d/1TAiG7nK7jThWDhA_b9VrDSSIkWyQiFwn/view?usp=sharing

Preprocessing Environment:

pip install --upgrade pip;

pip install keras;

pip install gensim;

pip install -U nltk;

pip install pyspark;

pip install fastai;

Training Environment:

conda create -n fastai python=3.6;

conda activate fastai;

conda install -c pytorch -c fastai fastai=1.0.51;

conda install pandas;

1. to run preprocess flow

python preprocess.py

OR

python preprocess_reduced.py

2. to generate word2vec model

python word2vec_generator.py

3. Train ULMFit model:

python Training.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
CSE 6250 Project Final Paper.pdf		CSE 6250 Project Final Paper.pdf
CSE6250 NLP Project.pptx		CSE6250 NLP Project.pptx
README.md		README.md
Training.py		Training.py
V1.ipynb		V1.ipynb
cnn_model_test.py		cnn_model_test.py
cnn_model_train.py		cnn_model_train.py
cnn_models.py		cnn_models.py
df_sample.csv		df_sample.csv
evaluate.py		evaluate.py
feature_extraction_seq.py		feature_extraction_seq.py
preprocess.py		preprocess.py
preprocess_reduced.py		preprocess_reduced.py
pretrained_wordvector.py		pretrained_wordvector.py
split_train_val_test.py		split_train_val_test.py
testLemma.py		testLemma.py
word2vec_generator.py		word2vec_generator.py

otevet/CSE6250-Big-Data-Analytics-for-Healthcare

Folders and files

Latest commit

History

Repository files navigation

CSE6250_project_team2

About

Resources

Stars

Watchers

Forks

Languages