Skip to content

RomainSenesi/kaggle_homesite

Repository files navigation

Kaggle Homesite scripts

Scripts and libraries used to participate to the Kaggle competition "Homesite Quote Conversion", where the objective is to predict which customers will purchase a quoted insurance plan. https://www.kaggle.com/c/homesite-quote-conversion

Dataset consists in a training set of 261 features for around 250.000 observations.

The predictive model developped here consists in averaging two simple predictive models (Gradient Boosted classification and K Nearest Neighbours classification). Parameters tuning has been performed thanks to the benchmark scripts.

This model gets a score of 0.96144, where the leader reaches a score of 0.97006 (score computed on a test set with the area under the ROC curve metric).

Files

  • file_handler.py: library of functions providing an abstraction level on top of the manipulated files (csv, cache, json,...)
  • summary.py: library of functions for plotting and describing the dataset's features
  • utils.py: library of functions to manipulate data (dates, categorical features,...)
  • benchmark_xgb.py: benchmark of the Gradient Boosted classification (xgboost library) with parameter tuning
  • benchmark_knn.py: benchmark of the K-Nearest-Neighbours classification (sklearn library) with parameter tuning
  • train_models.py : script that performs the classifiers training and serializes them into models folder
  • predict.py : loads the classifiers from models folder and performs the prediction (output is in results folder)

Directory structure

  • data: contains the dataset in 2 subfolders (originals in data/csv, cache in data/cache)
  • models : contains the classifiers trained and serialized
  • plots : directory reserved for plots
  • results : contains csv files for Kaggle submission

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages