Skip to content

wahahahah/projects

 
 

Repository files navigation

Welcome to my Project Repository!

My name is Kari Davis and I am a Data Scientist with a background in Math and Statistics. I have experience in insurance rate analysis and trending. I love gaining data-driven insights in areas like sports and technology and the ability to tell a story with data. You can find more on my website at https://kari.codes/. You can find all 5 of my data science projects in the folders above. Below is a brief overview of each of the projects. Thanks for viewing!

  • Savings Recommender: For this project, I built a content-based recommendation system that provides recommendations based on your personal credit card transaction history. The system recommends businesses most similar to those you frequent, but that are cheaper and have the same or better rating on yelp/google. The recommendations are based on yelp review topic vectors and business attributes and are further weighted by distance and category matching. This project uses Pandas, SciKit-learn, Flask, and both the Yelp and Google apis.

  • Pitcher Injury Prediction: For this project, I used a Linear SVM to classify MLB pitcher injuries based on the player's characteristics, game stats, and pitching repertoire. I found that certain pitches such as Sinkers and Cutters tended to increase in percentage thrown in the prior years leading to injury. This project uses Pandas, Postgres SQL, SciKit-Learn, Flask, Tableau, and oversampling.

  • Customer Support NLP: This project explores Customer Support Conversations on Twitter with VADER Sentiment Analysis, NMF Topic Modeling, and unsupervised clustering with DBSCAN. For visualization, PCA dimensionality reduction was used and showed the clusters successfully grouped similar business types based on the topic vectors. Tools that were used for this project include Pandas, Plotly, SciKit-Learn, Gensim, VADER, Tableau, and MongoDB.

  • Vinyl Resale Regression: For this project, I used a Linear Regression model to predict vinyl resale prices on data gathered from Discogs.com and determine feature importance for resale. I used the Spotify API to pull data on artist popularity to combine with Discogs data. This project uses Pandas, NumPy, Seaborn, Matplotlib, BeautifulSoup, Selenium, and SciKit-Learn.

  • MTA Turnstile Project: This project explores MTA Turnstile Data to find the most trafficked stations and the time periods in which to optimally advertise for an event. In addition to MTA Turnstile Data, I supplemented with NYC Census data to find top 10 stations in areas with target demographics relevant to our event such as higher percentages of women, higher income neighborhoods, and locations near tech companies. This project uses Pandas, NumPy, Matplotlib, and Seaborn.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.6%
  • Python 0.4%