Introduction

The bank-data.csv file contains 600 rows corresponding to bank customers, and 11 columns that describe each customer's family, basic demographics, and current banking products. The target column, pep, indicates whether the customer purchased a Personal Equity Plan after the most recent promotional campaign.

The overall goal of this analysis is to predict which customers are likely to purchase a PEP.

This is a more in-depth example than the Quickstart and more closely maps to a traditional data science workflow.

We proceed as follows:

Define a schema for the table
Load the data from csv and divide it into training and test subsets
Connect to the Veritable API
Create a Veritable Table and upload training rows
Create a Veritable Analysis and wait for it to complete
For each row in the test set, predict the value and uncertainty for the target column
Evaluate prediction accuracy using different maximum uncertainty thresholds

To run this demo, clone it locally and then:

pip install -r requirements.txt
pip install -e .
python -m bank_data.run

Files

run.py: A python script that divides the data in training and test sets, creates the table and analysis, and runs predictions on the test data
bank-data.csv: A csv file containg the bank dataset

Dataset Description

600 rows, each a bank customer
11 columns
some missing values
all column types: boolean, categorical, real, and count
target column is pep: whether the bank customer purchased a Personal Equity Product

Usage

$ export VERITABLE_KEY=yourapikey
$ python run.py

Output

The run.py script will print its progress and the results to the console:

Creating table 'bank-data-example' and uploading rows
Creating analysis 'main-analysis' and waiting for it to complete
Making predictions
Predictions for pep are 55% (33/60) correct with 0% (0/60) ignored using a maximum uncertainty of 0.5
Predictions for pep are 65% (17/26) correct with 57% (34/60) ignored using a maximum uncertainty of 0.4
Predictions for pep are 85% (6/7) correct with 88% (53/60) ignored using a maximum uncertainty of 0.3

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bank_data		bank_data
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bank_data

bank_data

test

test

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Introduction

Files

Dataset Description

Usage

Output

About

Releases

Packages

veronicalimpooikhoon/bank-data-example

Folders and files

Latest commit

History

Repository files navigation

Introduction

Files

Dataset Description

Usage

Output

About

Resources

Stars

Watchers

Forks