Skip to content

Laurae2/Santander

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Santander

Kaggle Santander

TO-DO

Nothing yet.

Important topics

Unbalanced data set + Noisy

To know before you start working on Santander

Major issues:

  • There are many duplicated rows
  • Many variables are linear combinations of other variables: any machine learning algorithm requiring non ill-conditioned matrix will fail hard (ex: Linear Discriminant Analysis)
  • There are many outliers: 99999999 values, etc.
  • There are variables that are constant
  • There are duplicates where the target is different. I'll compile a list later

About

Kaggle Santander

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published