Companies Bankruptcy Prediction

Summary

Our study uses financial ratios data of Polish companies to predict their default using machine-learning techniques.

The dataset used in this study is obtained from UCI Machine Learning Repository. The data contains 64 financial ratios and corresponding class label that indicates bankruptcy status after 2 years. Of the 9792 companies analyzed in this study, 515 companies (5.26%) went into bankruptcy, whereas 9277 (94.74%) firms survived.

Given that the dataset only contains financial ratios, instead of raw financial figures, we attempted and successfully reverse engineered the raw financial figures from financial ratios.

We find that ensemble techniques such as random forest provide the best results. Furthermore, we applied SHAP (SHapley Additive exPlanations) technique to explain the output of the model.

The best way to step through our work is to view the notebooks.

Data

The source data is obtained from UCI Machine Learning Repository

We created a data dictionary to map the given column names to financial ratios:

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

conda environment

To set up the conda environment, run:

conda env create -f environment.yml
conda activate company_default

If there is any additonal packages required, add to the yml file and run:

conda env update -f environment.yml --prune

To create the kernel for jupyter notebooks, run:

conda activate company_default
python -m ipykernel install --user --name company_default --display-name "Python (company_default)`

pre-commit

To set up pre-commit (which is used to run black before committing to Git), run:

pre-commit install

Add the following line in the first line of your notebook to run black formatting on that notebook:

%load_ext nb_black

input/output (io)

To set up the io, create a data folder in the root directory, which should have the following structure:

data/
- input/
  - train.csv
  - test.csv
- output/

The input folder contains train.csv and test.csv, while the output folder will have the pipeline output.

Tables	Description
train.csv	Labelled dataset used to train the model
test.csv	Unlablled dataset to make prediciton for

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
conf		conf
data_engineering		data_engineering
modeling		modeling
notebooks		notebooks
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

data_engineering

data_engineering

modeling

modeling

notebooks

notebooks

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Companies Bankruptcy Prediction

Summary

Data

Getting Started

conda environment

pre-commit

input/output (io)

About

Releases

Packages

Contributors 3

Languages

waijean/company_default

Folders and files

Latest commit

History

Repository files navigation

Companies Bankruptcy Prediction

Summary

Data

Getting Started

conda environment

pre-commit

input/output (io)

About

Resources

Stars

Watchers

Forks

Languages