Loyalty Program Creation

This is a fictional project for studying purposes. The business context and the insights are not real. The dataset is available on Kaggle.

1. Description of the Business Problem

The business leaders of an e-commerce company concluded that a good strategy to leverage sales is to create a loyalty program for their customers. So, the business team asked the data scientists to select the most valuable customers for the company Recency, frequency and monetary aspects were considered by the business team as the main characterists to evaluate the customers in clusters.

The tools that were created:

Machine Learning Clustering Model: Using the dataset from Kaggle, a machine learning clustering model was created to be used for client clustering using the dataset and also for the identification of future clients clusters.

The notebook used to create the model is available here.

2. Dataset Attributes

Attribute	Description
InvoiceNo	Number of purchase invoice.
StockCode	Code of the stock the object comes from.
Description	Description of the item purchased.
Quantity	Quantity of the item purchased.
InvoiceDate	Date of the invoice.
UnitPrice	Price of one item of the object purchased.
CustomerID	Identification number of the client responsible for the purchase.
Country	The country the purchase comes from.

3. Business Premises

The premises that were assumed for the development of the business problem solution are:

Stock codes with letters, like POST, D, PADS, were discarded because it is not possible to know exactly what they mean.
Unit prices lower than 0.04 were not considered because they seem to be wrong.
Customers with that return almost every purchase they make cannot be considered.

4. Solution Strategy

Understand the Business problem.
Download the dataset.
Clean the dataset removing outliers, NA values and unnecessary features.
Prepare the data to be used by the modeling algorithms encoding variables, splitting train and test dataset and other necessary operations.
Create the models using machine learning algorithms.
Evaluate the created models to find the one that best fits to the problem.
Tune the model to achieve a better performance.
Explore the data to create hypothesis, think about a few insights and validate them.
Deploy the model in production so that it is available to the user.
Find possible improvements to be explored in the future.

5. The Insights

I1: The customers of the loyalty program have a purchase volume (products) above 10% of the total purchases.

True: The loyalty program cluster has 34% of the total products purchased.

I2: The customers of the loyalty program have a volume (revenue) of purchases above 10% of the total purchases.

True: The loyalty program cluster has 46% of the total profit.

I3: Loyalty program customers have a lower number of returns than the average of the other customers.

False: Loyalty program cluster has an average quantity of retuns above the average of the other customers.

I4: The median billing by loyalty program customers is 10% higher than the median billing overall.

True: The median of the profit from the loyalty program cluster is 215% above the overall median.

I5: Loyalty program customers are on the third quantile.

False: They are mostly in the first quantile.

6. Machine Learning Modeling

The final result of this project is a clustering model. Some dimensionality reduction algorithms, like PCA (Principal COmponent Analysis), UMAP (Uniform Manifold Approximation and Projection) and t-SNE (Distributed Stochastic Neighbor Embedding) were used to create embedding spaces as alternatives for the features space. Some machine learning modelling algorithms were also used as options to find the best possible model. In all, 3 types of models were created, k-Means, GMM (Gaussian MNixture Model) and HC (Hierarchical Clustering). The table below presents some of the models created, the embedding algorithm used to create the model, the number os clusters and the silhouette score.

Model Name	Space Creation	Nº CLusters	Silhouette Score
k-Means	Features	2	0.69
GMM	Features	2	-0.01
HC	Features	2	0.65
k-Means	UMAP	15	0.56
GMM	UMAP	14	0.47
HC	UMAP	15	0.54
k-Means	t-SNE	13	0.45
GMM	t-SNE	13	0.36
HC	t-SNE	12	0.42
k-Means	Tree Embedding Space	15	0.48
GMM	Tree Embedding Space	2	0.43
HC	Tree Embedding Space	15	0.48

7. Final Model

The final model was chosen based on the number of clusrters that the business team chose considering the silhouette scores. The final model characteristcs are presented in the table below.

Model Name	Space Creation	Nº CLusters	Silhouette Score
k-Means	UMAP	11	0.52

The number of cluusters the business team belives to be the best is eleven. It is a good number because the silhouette score is one of the highest values found considering all the models created and the number of clusters is not high. The clusters profile with their average metrics are presented in the table below.

Cluster Number	Number of Customers	Customers Percentage	Gross Revenue	Recency	Products Purchased	Frequency	Returns
0	755	13.3	6260.09	11.7	241.9	0.05	76.6
1	383	6.7	2663.62	4.2	175.5	0.14	17.5
2	836	14.7	1705.62	36.6	98.1	0.04	16.7
3	392	6.9	1164.39	100.2	61.8	0.19	8.4
4	429	7.5	1028.46	290.7	59.7	0.63	202.2
5	277	4.9	906.62	362.6	65.1	1.05	2.5
6	586	10.3	861.55	35.1	44.7	0.71	3.5
7	595	10.4	774.43	135.1	65.1	0.77	3.7
8	391	6.9	647.62	199.1	47.2	1.02	2.4
9	408	7.2	606.15	56.2	46.1	1.07	6.6
10	643	11.3	492.88	246.8	39.6	1.02	1.6

8. Conclusion

Several models were created to meet the demand of the business team. FInally, it was possible to find a model that satisfied the data and business teams simultaneously. The features created in the beginning of the modeling process were effective to separate the customers in cluesters and find the cluster with the most valuable customers. The model can now be used by the business team to find the right marketing strategy for each customer according to the group they belong to and achieve higher profit.

9. Future Work

Try other clustering modeling algorithms.
Try other embedding spaces with more than 2 components.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
database		database
feature_transformation		feature_transformation
images		images
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
filename		filename
requirements.txt		requirements.txt
run_model.sh		run_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loyalty Program Creation

1. Description of the Business Problem

The tools that were created:

2. Dataset Attributes

3. Business Premises

The premises that were assumed for the development of the business problem solution are:

4. Solution Strategy

5. The Insights

6. Machine Learning Modeling

7. Final Model

8. Conclusion

9. Future Work

About

Releases

Packages

Languages

m4theus4ndr4de/clustering-loyalty-program-creation

Folders and files

Latest commit

History

Repository files navigation

Loyalty Program Creation

1. Description of the Business Problem

The tools that were created:

2. Dataset Attributes

3. Business Premises

The premises that were assumed for the development of the business problem solution are:

4. Solution Strategy

5. The Insights

6. Machine Learning Modeling

7. Final Model

8. Conclusion

9. Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages