GitHub - nickmcadden/Kaggle

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BNPParibas/code		BNPParibas/code
Bimbo		Bimbo
Caterpillar		Caterpillar
DSB2016/code		DSB2016/code
Homedepot		Homedepot
NCAA		NCAA
Otto		Otto
Rossmann/code		Rossmann/code
Santander/code		Santander/code
Springleaf		Springleaf
Statefarm		Statefarm
TNC/code		TNC/code
Twosigma/code		Twosigma/code
.gitignore		.gitignore
readme		readme

Repository files navigation

Kaggle Competitions
________________________

This repository contains code used to compete in Kaggle competitions from 2015 onwards.
The competitions are based mostly on commercial briefs although some are purely for fun (March Machine Learning Mania is a basketball game prediction challenge)

The final ranking obtained in the competitions can be seen here:

https://www.kaggle.com/nickmcadden/competitions

The Code
____________

The code is written mostly in Python (there are some R scripts too) and makes heavy use of the Scikit Learn libraries.
The more recent projects use a standardised directory structure

> code
> input
> output

Where the input directory contains the very large source data files usually in csv, h5 or json file format. This data is not replicated here.
The output directory usually contains a number of csv files which are the individual submissions to the competition. These csv files are ignored in the github repository.

The code directories usually contain many files which are of the following types

Data import and transformation
--------------------------------

These files are prefixed with the word 'data'.
They perform the job of reading the training and testing data files, and handling the following tasks.

Reading the data files (usually in csv, h5 format)
Imputing missing values
Converting categorical data into numeric format
Aggregating multiple input files into one if required

Data processing
----------------

The main data processing is handled in files which are usually named with reference to the algorithm being used.
These files will use one and only one of the data files within the same directory.
Examples:
xgb.py will process the data using the XGBoost algorithm.
rf.py will process the data using the Random Forest algorithm.
etc.py will process the data using the Extra Trees Classifier algorithm.

The results are written to csv format within this section of code.

Utility Files
---------------
These are additional files to help with the analysis. Examples include:
Feature selection (useful for high dimensional data)
Clustering (creating aditional data columns)
Ensembling (aggregation of output csv predictions)

About

No description, website, or topics provided.

Readme

Activity

0 stars

2 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BNPParibas/code

BNPParibas/code

Bimbo

Bimbo

Caterpillar

Caterpillar

DSB2016/code

DSB2016/code

Homedepot

Homedepot

NCAA

NCAA

Otto

Otto

Rossmann/code

Rossmann/code

Santander/code

Santander/code

Springleaf

Springleaf

Statefarm

Statefarm

TNC/code

TNC/code

Twosigma/code

Twosigma/code

.gitignore

.gitignore

readme

readme

Repository files navigation

About

Releases

Packages

Languages

nickmcadden/Kaggle

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages