dshk1-loan-default

Workshop on Kaggle's Loan Default dataset for General Assembly Hong Kong Data Science course

Competition website: http://www.kaggle.com/c/loan-default-prediction

Download the data: https://drive.google.com/file/d/0B7tb9lqp600SNE0zeFUwNE9NWlk/edit?usp=sharing Sample submission file: https://drive.google.com/file/d/0B7tb9lqp600Sd1EzR01pQmttT0U/edit?usp=sharing

My contact: hgcrpd@gmail.com

Here's the link to my actual code: https://github.com/hxu/kaggle-loan-default

Pseudocode for overall pipeline:

Get the training data X, and known values y Figure out what model you want to run:

Selecting features
Cross validation - to find the best features
Grid search - to get the best parameters for our model

Get the training data X, and known values y Set up the pipeline:

Extract the features we already selected
Cleaning up the data:
- Fill NAs
- Remove objects, unwanted columns Fit the default model on X and y

Select the rows from Training data X and known values y that are predicted default Set up the loss given default pipeline:

Extract the features
Clean up the data Fit the loss (random forest) model on defaulted Xs and ys

Get the test data, test_X Run it through default pipeline:

Extract features
Clean data in exact same way as the train data Predict defaults = [0, 1, 0, 0, 0, 1] ~ 100,000

Select the rows from the test data that we think default Run it through the loss pipeline - Extract the features - clean the data Predict loss = [2, 48, 100,... 0] ~ 10,000

Combine the predictions: If predicted default ( ==1) = set to loss Final prediction = [0, 0, 0, 2, 0, 0, 48, 0, ..., 10] ~ 100,000

Write to file Submit

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classes.py		classes.py
explore.py		explore.py
logistic_001.py		logistic_001.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

classes.py

classes.py

explore.py

explore.py

logistic_001.py

logistic_001.py

Repository files navigation

dshk1-loan-default

Pseudocode for overall pipeline:

About

Releases

Packages

License

hxu/dshk1-loan-default

Folders and files

Latest commit

History

Repository files navigation

dshk1-loan-default

Pseudocode for overall pipeline:

About

Resources

License

Stars

Watchers

Forks