Skip to content

xz2139/Machine-Learning-Project

Repository files navigation

Machine-Learning-Project

Two scripts are for preprocessing data, Judge_Bio_Dataset_Preprocess.py is the first

Data+prep1.py is the second

The sentencing text files should be unziped first and change to reletive path in order for prepreocessing The preprocessing may take hours or days. We also uploaded the preprocessed data for running models directly.

In the DeepOLS&SecondStage.py and model_performance.py, we compare the vectorizer. And the file cc_merged_0429.csv is a table with raw text data. However it is too big for uploading to github. Hence we provide seperate link for downloading : https://drive.google.com/file/d/1b8OGjZf__hxe_olYdPzYqCofTtbusXhr/view?usp=sharing

The models should be able to run using bash script in bashscript.sh. please put the code in the same directory as the data. You should be able to run the code easily for both the jupyter notebook and .py file. We wasn't able to test in on server as we cannot install virtual enviornment due to permission issue.

Also we provide python notebook for code illustration.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages