DataMining_CreditScoring

In this project, I implement different machine learning algorithms to improve credit scoring. The dataset used for training the model contains more than 150,000 observations and 7 features. Features engineering helps to expand the features. Several models have been applied:

Logistic Regression with Regularization, Decision Tree, SVM with different kernels, Naive Bayes and alpha-Tree;
Emsemble methods such as Random Forest, Gradient Boosting;

Since the classification problem is unbalanced, resampling approaches is applied to balance the dataset before training.

Cross validation is applied to avoid over-fitting.

Use the model to predict for the testing dataset, improve the Area under Curve from 84% to 91%.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
knn.py		knn.py
preprocessing_1105.py		preprocessing_1105.py
prolog.R		prolog.R
rbf.py		rbf.py
svm_classifier.py		svm_classifier.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

knn.py

knn.py

preprocessing_1105.py

preprocessing_1105.py

prolog.R

prolog.R

rbf.py

rbf.py

svm_classifier.py

svm_classifier.py

test.py

test.py

Repository files navigation

DataMining_CreditScoring

About

Releases

Packages

Languages

zhangpu0703/DataMining_CreditScoring

Folders and files

Latest commit

History

Repository files navigation

DataMining_CreditScoring

About

Resources

Stars

Watchers

Forks

Languages