GitHub - sjuvekar/Kaggle-Dato

This is the code for models that ended-up in top 10 in Kaggle's 'Dato - Truly Native?' contest

https://www.kaggle.com/c/dato-native

This code mainly builds XGBoost and MLP Keras models using HashingVectorizer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html) features. Here are the steps to build the models:

Assume that all data (i.e. all csv and html_txt files) is present in the data/ directory.
Build HashingVectorizer features first using src/FeatureExtractor.py
Learn a slightly-weak Naive-Bayes Classifier using all features
Perform feature-extraction using Naive-Bayes feature importance
Convert those important features in LibSVM format for easy sparse encoding
Learn both XGBoost and Keras models using these features

We have provided a simple run.sh script automate these steps. An ensemble of these two models alone could give you a validation AUC-score of close to 0.984

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
working		working
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

working

working

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

run.sh

run.sh

Repository files navigation

About

Releases

Packages

Languages

sjuvekar/Kaggle-Dato

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages