Welcome to my Project Repository!

My name is Kari Davis and I am a Data Scientist with a background in Math and Statistics. I have experience in insurance rate analysis and trending. I love gaining data-driven insights in areas like sports and technology and the ability to tell a story with data. You can find more on my website at https://kari.codes/. You can find all 5 of my data science projects in the folders above. Below is a brief overview of each of the projects. Thanks for viewing!

Savings Recommender: For this project, I built a content-based recommendation system that provides recommendations based on your personal credit card transaction history. The system recommends businesses most similar to those you frequent, but that are cheaper and have the same or better rating on yelp/google. The recommendations are based on yelp review topic vectors and business attributes and are further weighted by distance and category matching. This project uses Pandas, SciKit-learn, Flask, and both the Yelp and Google apis.
Pitcher Injury Prediction: For this project, I used a Linear SVM to classify MLB pitcher injuries based on the player's characteristics, game stats, and pitching repertoire. I found that certain pitches such as Sinkers and Cutters tended to increase in percentage thrown in the prior years leading to injury. This project uses Pandas, Postgres SQL, SciKit-Learn, Flask, Tableau, and oversampling.
Customer Support NLP: This project explores Customer Support Conversations on Twitter with VADER Sentiment Analysis, NMF Topic Modeling, and unsupervised clustering with DBSCAN. For visualization, PCA dimensionality reduction was used and showed the clusters successfully grouped similar business types based on the topic vectors. Tools that were used for this project include Pandas, Plotly, SciKit-Learn, Gensim, VADER, Tableau, and MongoDB.
Vinyl Resale Regression: For this project, I used a Linear Regression model to predict vinyl resale prices on data gathered from Discogs.com and determine feature importance for resale. I used the Spotify API to pull data on artist popularity to combine with Discogs data. This project uses Pandas, NumPy, Seaborn, Matplotlib, BeautifulSoup, Selenium, and SciKit-Learn.
MTA Turnstile Project: This project explores MTA Turnstile Data to find the most trafficked stations and the time periods in which to optimally advertise for an event. In addition to MTA Turnstile Data, I supplemented with NYC Census data to find top 10 stations in areas with target demographics relevant to our event such as higher percentages of women, higher income neighborhoods, and locations near tech companies. This project uses Pandas, NumPy, Matplotlib, and Seaborn.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
customer_support_nlp		customer_support_nlp
mta_turstile_project		mta_turstile_project
pitcher_injury_prediction		pitcher_injury_prediction
savings_recommender		savings_recommender
vinyl_resale_regression		vinyl_resale_regression
.gitignore		.gitignore
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

customer_support_nlp

customer_support_nlp

mta_turstile_project

mta_turstile_project

pitcher_injury_prediction

pitcher_injury_prediction

savings_recommender

savings_recommender

vinyl_resale_regression

vinyl_resale_regression

.gitignore

.gitignore

ReadMe.md

ReadMe.md

Repository files navigation

Welcome to my Project Repository!

About

Releases

Packages

Languages

wahahahah/projects

Folders and files

Latest commit

History

Repository files navigation

Welcome to my Project Repository!

About

Resources

Stars

Watchers

Forks

Languages