This was my first ML project after taking the course Deep Learning Specialization on Coursera.
In the project, I mainly explored the area of different data processing and feature engineering techniques as in feature_engineering.py
and Advanced_feature_engineering_refactored.py
, including feature selection, missing data handling, categorical data to ordinal data conversion by mapping. Besides, I also studied the stacking technique as in stacking.py
. stacking
is defined with the specific example in stacking.py
.
The highest accuracy rate was 0.80861, achieved using feature engineering with 5 layers of stacking. This ranks top 5% in the leadboard of the challenge.
Throughout this project, much of indepedent studies was done on the understading of different statistical concepts and various models, the summarys can be found here.