ALTEGRAD Challenge Fall 2018

This repo is about the Kaggle in-class challenge https://www.kaggle.com/c/altegrad-challenge-fall-17

This challenge was inspired by the Quora dataset challenge https://www.kaggle.com/c/quora-question-pairs

We finish at the #1 place on the Private and Public leaderboard.

Contact information

Kaggle in-class team name: AitAzzi - Ohleyer - Sutton

For any question/request related to this code, please send an email to on of these addresses: abderrahim.aitazzi@ens-paris-saclay.fr, sebastien.ohleyer@gmail.com, michael.sutton@student.ecp.fr.

Code organization

Feature generation

Each feature can be generated from the Feature_generation.ipynb notebook. This notebook query .py files in the directory feature_engineering.

Each .py computes their corresponding features and write them in a CSV file in the data directory. For simplicity, we directly provide CSV files containing every features.

LightGB

Use the Lightgb_classification.ipynb notebook to perform classification. Our parameters leading to our best submission are provided in this notebook (a random initialization of the model set apart).

This notebook calls the function load_features.py which load features from every CSV files in the data directory and combine them in a unique pandas DataFrame.

For each classifier fitting, a CSV file containing, the model, its parameters and cross-validation results will be written in the log directory. To compare results with our previous fittings we also provide some of our results.

Neural networks

Use the Neural_net.ipynb notebook to perform classification using the three networks we experimented (Fully connected, LSTM, LSTM with features). The notebook is divided in three sections (you should always run the first section).

Load data and features
Fully connected layer
This part is divided in the following subpart spliting the data into training and validation sets, model definition, training, loss visualisation, test prediction.
LSTM
We first prepare the sentences to feed the neural network then split the data into training and validation sets. After this preparation step, we have two similiar pieces of code corresponding respectively to the network that uses only text and the one that uses also features. They have both the same structure as describe in 2. .
In the model definition you can comment or uncomment lines to choose weither to use multiplication or concatenation merging method and weither to use L2 normalisation or not.

Requirements

Every package used in these codes were install with their latest version, except for:

NetworkX: v1.11

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
features_engineering		features_engineering
log		log
.gitignore		.gitignore
Feature_generation.ipynb		Feature_generation.ipynb
Lightgb_classification.ipynb		Lightgb_classification.ipynb
Neural_net.ipynb		Neural_net.ipynb
README.md		README.md
lgb_train.py		lgb_train.py
load_features.py		load_features.py
log.py		log.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

features_engineering

features_engineering

log

log

.gitignore

.gitignore

Feature_generation.ipynb

Feature_generation.ipynb

Lightgb_classification.ipynb

Lightgb_classification.ipynb

Neural_net.ipynb

Neural_net.ipynb

README.md

README.md

lgb_train.py

lgb_train.py

load_features.py

load_features.py

log.py

log.py

Repository files navigation

ALTEGRAD Challenge Fall 2018

Contact information

Code organization

Feature generation

LightGB

Neural networks

Requirements

About

Releases

Packages

Languages

sohleyer/Altegrad

Folders and files

Latest commit

History

Repository files navigation

ALTEGRAD Challenge Fall 2018

Contact information

Code organization

Feature generation

LightGB

Neural networks

Requirements

About

Resources

Stars

Watchers

Forks

Languages