Skip to content

lizzzfang/Customer-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer-Analytics

Analyzing customer behavior by predictive analytics

Introduction

Customer Lifetime Value (CLV)

CLV analysis uses a python library - lifetimes

Customer Segmentation

Customer segmentation is the act of separating the target customers into different groups based on demographic or behavioral data so that marketing strategies can be tailored more specifically to each group.

Clustering is used for performing this segmentation.

K-Means for clustering continuous data

K-Modes for clustering categorical data

K-Prototype for clustering mixed typed data

Project

Data Preprocessing

The data has 541909 rows and 8 columns.

Data Dictionary

Column Name Type Description
InvoiceNo Nominal Invoice number, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
StockCode Nominal Product (item) code, a 5-digit integral number uniquely assigned to each distinct product.
Description Nominal Product (item) name.
Quantity Numeric The quantities of each product (item) per transaction.
InvoiceDate Numeric Invice Date and time, the day and time when each transaction was generated.
UnitPrice Numeric Unit price, Product price per unit in sterling.
CustomerID Nominal Customer number, a 5-digit integral number uniquely assigned to each customer.
Country Nominal Country name, the name of the country where each customer resides

Data Cleaning

  1. Drop duplicates
  2. Remove records with blank customerID
  3. Cancelled transactions Identify the cancelled orders (Quantity < 0) Find the corresponding orders that have been cancelled (InvoiceDate of cancelled order > InvoiceDate of original order && The value of other columns are the same)
    • For cancelled records with one counterpart: delete
    • For cancelled records with multiple counterparts: delete the recent transaction
  4. Create a Sales column (Quantity * UnitPrice)

Output: cleaned_data.csv

RFM

RFM stands for the three dimensions:

  • Recency: How recently did the customer purchase?
  • Frequency: How often do they purchase?
  • Monetary Value: How much do they spend?

There could be different definitions. In lifetimes library:

frequency represents the number of repeat purchases the customer has made. This means that it’s one less than the total number of purchases. This is actually slightly wrong. It’s the count of time periods the customer had a purchase in. So if using days as units, then it’s the count of days the customer had a purchase on. T represents the age of the customer in whatever time units chosen (weekly, in the above dataset). This is equal to the duration between a customer’s first purchase and the end of the period under study. recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.) monetary_value represents the average value of a given customer’s purchases. This is equal to the sum of all a customer’s purchases divided by the total number of purchases. Note that the denominator here is different than the frequency described above.

Output: RFM.csv

Reference:

About

Analyzing customer behavior by predictive analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published