Skip to content

Regression and clustering study was performed on Starbucks data. Rfm analysis performed.

Notifications You must be signed in to change notification settings

haticebakir/StarbucksMachineLearning

Repository files navigation

Regression and clustering study was performed on Starbucks data. Rfm analysis performed. In this study, regression and cluster analysis was performed using the supervised learning and unsupervised learning approaches, which are among the machine learning methods. Regression analysis was performed in the supervised learning approach and linear regression, Random Forest and SVR algorithms were tested. Their performances have been compared. The performances of K-Means, OPTICS and DBSCAN algorithms were examined by performing clustering study in unsupervised learning approach. The aim of the study is to measure and compare the performance of models by using regression and clustering algorithms determined on the data set. In the study, it is aimed to examine customer behaviors according to company offers offered to customers by segmentation of customers. The dataset to which the models are applied includes customer data, including purchasing habits and interactions with promotional offers, of a large coffee company holding loyalty cards. The data set consists of three separate files: profile, portfolio and transcript. These are the profile data set that shows the customer profile of the customers, including information such as age, gender and income. The Porfolio data set contains information about offers sent to customers via web, email, mobile and social media channels. Transcript shows a list of bid interactions and all other actions. It is recorded when the customer receives, reviews and completes the offer. Regarding the success, the algorithm with the best performance in regression analysis was the Random Forest algorithm. When we look at the performance metrics of the method with the highest success in cluster analysis, which is the unsupervised learning approach, OPTICS and DBSCAN are. However, the K-Means algorithm is preferred according to the purpose of use of the method. In regression analysis, one of the performance tools, to investigate the bias of learning and test sets of machine learning models; R2, Mean Square Error (MSE), in clustering; Techniques such as Silhoutte Coefficient and SSE were used. Considering the experimental results in the cluster analysis, four clusters were obtained. It has been determined that these clusters provide information about the profiles of the coffee company customers and will help the company decide on the discount, promotion and information offers it offers to its customers. Customers who are divided into groups by customer segmentation will be effective in the behavior of the company towards its customers from now on. As a result of the regression analysis, results showing the effect of demographic characteristics of customers on their purchasing habits were obtained. Although men tend to buy more, women prefer more expensive products with higher income. Provides information on age and gender customer expenses. But income is an important factor in trying to predict coffee prices.

About

Regression and clustering study was performed on Starbucks data. Rfm analysis performed.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published