Kaggle Homesite scripts

Scripts and libraries used to participate to the Kaggle competition "Homesite Quote Conversion", where the objective is to predict which customers will purchase a quoted insurance plan. https://www.kaggle.com/c/homesite-quote-conversion

Dataset consists in a training set of 261 features for around 250.000 observations.

The predictive model developped here consists in averaging two simple predictive models (Gradient Boosted classification and K Nearest Neighbours classification). Parameters tuning has been performed thanks to the benchmark scripts.

This model gets a score of 0.96144, where the leader reaches a score of 0.97006 (score computed on a test set with the area under the ROC curve metric).

Files

file_handler.py: library of functions providing an abstraction level on top of the manipulated files (csv, cache, json,...)
summary.py: library of functions for plotting and describing the dataset's features
utils.py: library of functions to manipulate data (dates, categorical features,...)
benchmark_xgb.py: benchmark of the Gradient Boosted classification (xgboost library) with parameter tuning
benchmark_knn.py: benchmark of the K-Nearest-Neighbours classification (sklearn library) with parameter tuning
train_models.py : script that performs the classifiers training and serializes them into models folder
predict.py : loads the classifiers from models folder and performs the prediction (output is in results folder)

Directory structure

data: contains the dataset in 2 subfolders (originals in data/csv, cache in data/cache)
models : contains the classifiers trained and serialized
plots : directory reserved for plots
results : contains csv files for Kaggle submission

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

plots

plots

results

results

.gitignore

.gitignore

README.md

README.md

benchmark_knn.py

benchmark_knn.py

benchmark_xgb.py

benchmark_xgb.py

file_handler.py

file_handler.py

predict.py

predict.py

summary.py

summary.py

train_models.py

train_models.py

utils.py

utils.py

Repository files navigation

Kaggle Homesite scripts

Files

Directory structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
models		models
plots		plots
results		results
.gitignore		.gitignore
README.md		README.md
benchmark_knn.py		benchmark_knn.py
benchmark_xgb.py		benchmark_xgb.py
file_handler.py		file_handler.py
predict.py		predict.py
summary.py		summary.py
train_models.py		train_models.py
utils.py		utils.py

RomainSenesi/kaggle_homesite

Folders and files

Latest commit

History

Repository files navigation

Kaggle Homesite scripts

Files

Directory structure

About

Resources

Stars

Watchers

Forks

Languages