Skip to content

Uses financial ratios data to predict probability of default for companies

Notifications You must be signed in to change notification settings

waijean/company_default

Repository files navigation

Companies Bankruptcy Prediction

Summary

Our study uses financial ratios data of Polish companies to predict their default using machine-learning techniques.

The dataset used in this study is obtained from UCI Machine Learning Repository. The data contains 64 financial ratios and corresponding class label that indicates bankruptcy status after 2 years. Of the 9792 companies analyzed in this study, 515 companies (5.26%) went into bankruptcy, whereas 9277 (94.74%) firms survived.

Given that the dataset only contains financial ratios, instead of raw financial figures, we attempted and successfully reverse engineered the raw financial figures from financial ratios.

We find that ensemble techniques such as random forest provide the best results. Furthermore, we applied SHAP (SHapley Additive exPlanations) technique to explain the output of the model.

The best way to step through our work is to view the notebooks.

Data

The source data is obtained from UCI Machine Learning Repository

We created a data dictionary to map the given column names to financial ratios:

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

conda environment

To set up the conda environment, run:

conda env create -f environment.yml
conda activate company_default

If there is any additonal packages required, add to the yml file and run:

conda env update -f environment.yml --prune

To create the kernel for jupyter notebooks, run:

conda activate company_default
python -m ipykernel install --user --name company_default --display-name "Python (company_default)`

pre-commit

To set up pre-commit (which is used to run black before committing to Git), run:

pre-commit install 

Add the following line in the first line of your notebook to run black formatting on that notebook:

%load_ext nb_black

input/output (io)

To set up the io, create a data folder in the root directory, which should have the following structure:

  • data/
    • input/
      • train.csv
      • test.csv
    • output/

The input folder contains train.csv and test.csv, while the output folder will have the pipeline output.

Tables Description
train.csv Labelled dataset used to train the model
test.csv Unlablled dataset to make prediciton for

About

Uses financial ratios data to predict probability of default for companies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published