The dataset was from a work assignment of a Data Scientist job application. In this assignment, I completed a mini data science project that involves end-to-end pipeline from data cleaning, data preparing, featuring exploration and engineering, model prototyping and selection, and evaluation.
The goal of the project is to predict the interest rate of loan applications using a mixture of very heterogeneous data columns. Some columns contain useful features while some are totally irrelevant. Some columns may contain many missing values that need to be properly imputed.
The python script and results are summarized in the following Jupyter notebooks: