- Group ID: 3
- Group Members:
- Tu Anqi (Team Leader)
- Clarence Castillo
- Tang Jiayun
- Eko Edita Limanta
- Andre Kristanto
- Hans Albert Lianto
- Download
train.csv
andtest.csv
from Kaggle Competition - Microsoft Malware Prediction - Unzip the csv files, put them under the
/data
folder. - Run
pip install -r requirements.txt
to install all required python libraries. ( Use Python version3.6.8
) - (Optional) The script
1_check_data.py
is to- generate data summary (missing frequency, value counts for categorical data, etc.), and
- generate plots to visualize data (boxplot for numeric data, histogram for categorical data, etc.)
- (Optional) Run the script
2_analyze_data.py
to- perform data analytics, and
- check rationales behind data preprocessing steps in the experiment for next step
- Run the script
3_experiment.py
to conduct experiment on the prediction task - (Optional) Run the script
4_compare_performance.py
to compare performance of all models - Run the script
5_submit.py
to predict for the Kaggle test set for submission