GitHub - haticebakir/StarbucksMachineLearning: Regression and clustering study was performed on Starbucks data. Rfm analysis performed.

Regression and clustering study was performed on Starbucks data. Rfm analysis performed. In this study, regression and cluster analysis was performed using the supervised learning and unsupervised learning approaches, which are among the machine learning methods. Regression analysis was performed in the supervised learning approach and linear regression, Random Forest and SVR algorithms were tested. Their performances have been compared. The performances of K-Means, OPTICS and DBSCAN algorithms were examined by performing clustering study in unsupervised learning approach. The aim of the study is to measure and compare the performance of models by using regression and clustering algorithms determined on the data set. In the study, it is aimed to examine customer behaviors according to company offers offered to customers by segmentation of customers. The dataset to which the models are applied includes customer data, including purchasing habits and interactions with promotional offers, of a large coffee company holding loyalty cards. The data set consists of three separate files: profile, portfolio and transcript. These are the profile data set that shows the customer profile of the customers, including information such as age, gender and income. The Porfolio data set contains information about offers sent to customers via web, email, mobile and social media channels. Transcript shows a list of bid interactions and all other actions. It is recorded when the customer receives, reviews and completes the offer. Regarding the success, the algorithm with the best performance in regression analysis was the Random Forest algorithm. When we look at the performance metrics of the method with the highest success in cluster analysis, which is the unsupervised learning approach, OPTICS and DBSCAN are. However, the K-Means algorithm is preferred according to the purpose of use of the method. In regression analysis, one of the performance tools, to investigate the bias of learning and test sets of machine learning models; R2, Mean Square Error (MSE), in clustering; Techniques such as Silhoutte Coefficient and SSE were used. Considering the experimental results in the cluster analysis, four clusters were obtained. It has been determined that these clusters provide information about the profiles of the coffee company customers and will help the company decide on the discount, promotion and information offers it offers to its customers. Customers who are divided into groups by customer segmentation will be effective in the behavior of the company towards its customers from now on. As a result of the regression analysis, results showing the effect of demographic characteristics of customers on their purchasing habits were obtained. Although men tend to buy more, women prefer more expensive products with higher income. Provides information on age and gender customer expenses. But income is an important factor in trying to predict coffee prices.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
.library.json		.library.json
Proje2.ipynb		Proje2.ipynb
README.md		README.md
helpers.py		helpers.py
main_cluster.csv		main_cluster.csv
portfolio.json		portfolio.json
portfolio1.json		portfolio1.json
portfolio_profile_transcript_cleaned.csv		portfolio_profile_transcript_cleaned.csv
profile.json		profile.json
profile1.json		profile1.json
proje1.ipynb		proje1.ipynb
proje3.ipynb		proje3.ipynb
proje4.ipynb		proje4.ipynb
proje_bolum_1.ipynb		proje_bolum_1.ipynb
proje_bolum_2.ipynb		proje_bolum_2.ipynb
projedeneme.ipynb		projedeneme.ipynb
transcript1.json		transcript1.json
transcriptger.json		transcriptger.json
transcriptolan.json		transcriptolan.json

haticebakir/StarbucksMachineLearning

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages