Fraud Case Study For Galvanize

See fraud.pdf to see our work

For this two day case study my team and I engineered an end to end pipeline for predicting fraud from streaming data.

The Process

We spent most of the first day doing EDA, choosing the most important features from the original 44, and better understanding our data. Our training dataset consisted of 14,337 events that were labeled as some type of fraud or not. The second day we spend modeling, performing a cost analysis, and building a flask web-app that would use the streaming data to predict fraud.

Modeling

After choosing the features we wanted to keep, we utilized RandomUnderSampler to fix our class imbalance and improve the accuracy our model. We then started testing different models using F1 score as our performance metric. We started with Naive Bayes, moved on to Logistic Regression, Decision Tree, Gradient Boosted Trees, and Random Forests.

Most of these models gave pretty good accuracy, but we decided on a Random Forest Classifier because it gave us the highest F1 score of .94.

Cost Analysis

For the cost analysis, we had to make a few assumptions:

average payment per customer $1100
the chance of churn for a customer investigated (churning due to inconvenience) 1/100
average cost of predicting fraud when they're not (false positive) $220
average cost of predicting not fraud when they are (false negative) $1100

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
README.md		README.md
clean_data.py		clean_data.py
example_app.py		example_app.py
fraud.pdf		fraud.pdf
model.py		model.py
parse.py		parse.py
predict.py		predict.py
predict_tyler.py		predict_tyler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

README.md

README.md

clean_data.py

clean_data.py

example_app.py

example_app.py

fraud.pdf

fraud.pdf

model.py

model.py

parse.py

parse.py

predict.py

predict.py

predict_tyler.py

predict_tyler.py

Repository files navigation

Fraud Case Study For Galvanize

The Process

Modeling

Cost Analysis

Confusion Matrix

Cost Analysis

Metrics

About

Releases

Packages

Languages

tyleratk/fraud-case-study

Folders and files

Latest commit

History

Repository files navigation

Fraud Case Study For Galvanize

The Process

Modeling

Cost Analysis

Confusion Matrix

Cost Analysis

Metrics

About

Resources

Stars

Watchers

Forks

Languages