Skip to content

sjuvekar/Kaggle-Dato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the code for models that ended-up in top 10 in Kaggle's 'Dato - Truly Native?' contest

https://www.kaggle.com/c/dato-native

This code mainly builds XGBoost and MLP Keras models using HashingVectorizer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html) features. Here are the steps to build the models:

  • Assume that all data (i.e. all csv and html_txt files) is present in the data/ directory.
  • Build HashingVectorizer features first using src/FeatureExtractor.py
  • Learn a slightly-weak Naive-Bayes Classifier using all features
  • Perform feature-extraction using Naive-Bayes feature importance
  • Convert those important features in LibSVM format for easy sparse encoding
  • Learn both XGBoost and Keras models using these features

We have provided a simple run.sh script automate these steps. An ensemble of these two models alone could give you a validation AUC-score of close to 0.984

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published