Customer Segmentation Model in E-commerce Using Clustering Techniques and LRFM Model: The Case of Online Stores in Morocco

 Abstract — Given the increase in the number of e-commerce sites, the number of competitors has become very important. This means that companies have to take appropriate decisions in order to meet the expectations of their customers and satisfy their needs. In this paper, we present a case study of applying LRFM (length, recency, frequency and monetary) model and clustering techniques in the sector of electronic commerce with a view to evaluating customers’ values of the Moroccan e-commerce websites and then developing effective marketing strategies. To achieve these objectives, we adopt LRFM model by applying a two-stage clustering method. In the first stage, the self-organizing maps method is used to determine the best number of clusters and the initial centroid. In the second stage, k-means method is applied to segment 730 customers into nine clusters according to their L, R, F and M values. The results show that the cluster 6 is the most important cluster because the average values of L, R, F and M are higher than the overall average value. In addition, this study has considered another variable that describes the mode of payment used by customers to improve and strengthen clusters’ analysis. The clusters’ analysis demonstrates that the payment method is one of the key indicators of a new index which allows to assess the level of customers’ confidence in the company's Website.


I. INTRODUCTION
CCORDING to the figures released by Interbank Electronic banking Centre (IEBC), the e-commerce sector in Morocco has experienced a +10,3 % increase in the first quarter of 2015 in the number of transactions and +10,5 % in the amount of spent relative to the same period in 2014 [1].
The birth of the company Morocco Telecommerce in 2001, the first leading operator of electronic commerce in Morocco helped to crystallize the ambitions of many entrepreneurs who have found an effective way to secure their transactions through the electronic payment security system.This was setup by the company in question, which is certified and recognized by the Interbank Electronic Banking Centre Rachid Ait Daoud and Rachid Lbibb are with the Department of Physics, Sultan Moulay Slimane University, PB 523, Beni Mellal, Morocco (e-mail: daoud.rachid@gmail.com,rachid.lbibb@gmail.com).
(IEBC), Moroccan banks, and by international organizations such as Visa and MasterCard.
577 contracts were signed late in 2014 with certifying bodies such as Morocco Telecommerce (MTC) and the Interbank Electronic Banking Centre (IEBC).There were 140 at the end of 2010.In the end, it is 577 e-commerce sites that are currently active and referenced by Morocco Telecommerce (MTC).The trend is not ready to calm down.The year 2015 should also know its own share of novelties, since the IEBC expects to reach 700 affiliated online merchants by the end of 2015.No doubt, this continuous increase of new sites allows for the improvement and activation of more online sales [1].
With this increase in the number of merchant sites affiliated with the IEBC, they have realized 527000payment onlineoperations with credit cards, both local and foreign, for a total of MAD 285,3 million in the first quarter of 2015.The IEBC added that the activity by the Moroccan card rose from 487 000 transactions in the first quarter of 2014 to MAD 506 million during the same period in 2015 (+5,8%) and from 242,5 million to MAD 249,8 million (+3%) in terms of their amount [1].According to these figures, the Moroccan internet users become more familiar with the online payment.In order to survive and cope with the competition related to the explosive growth of electronic commerce, the companies must develop innovative marketing activities to identify different customers and try its best and utmost to preserve them or, at least, care for the most loyal ones and get their satisfaction, because the identification of such segments can be the basis for an effective strategy to target and predict potential customers [2].
Market segmentation is the process of identifying key groups within the general market that share specific characteristics and consuming habits and it provides the management to customize the products or services to fulfill their needs [3].Nowadays, RFM model, which was developed by Hughes (1994), is one of the most common methods for segmenting and identifying customer values in companies.This method depends on Recency, Frequency and Monetary measures which are considered as three important variables used to extract the behavioral characteristics of customers, and that influence their future purchasing possibilities [4].By adopting RFM model, marketing managers can effectively target valuable customers and then develop marketing strategies for them based on their values [5].
Recent studies find that the addition of supplementary variables to the classical model RFM can improve its predictability when predicting customer behaviors [6].For example, the model RFM was extended by [7] by adding an additional variable customer relation length (L) to it, to become LRFM model, because by adopting RFM model, the companies cannot effectively distinguish between the shortlife and long-life customers [8].(L) measures the time period between the first visit and the last visit of a particular customer.In this paper, we use the results of this method (LRFM) as inputs for clustering algorithms to determine the customer's loyalty for an online selling company in Morocco.A real case study for an online selling company in Morocco is employed by combining LRFM model and data mining techniques (cluster analysis) to achieve better market segmentation and improve customer satisfaction.Data mining techniques such as Self Organizing Maps and K-means are used in this study to group all customers into clusters.The characteristics of each cluster are examined in order to determine and retain profitable and loyal customers.As mentioned before, the customers are segmented into similar clusters according to their LRFM values.
The customer purchases database consists of 730 customers who purchased directly from the company website from November 2013 to January 2015.The profile for each customer includes the customer identifier, gender, birth date, city, shopping frequency, date of first transaction, date of last purchase, the total expense, and mode of payment.
The remainder of the paper is as follows: Section II provides the literature review on RFM, LRFM models and data mining techniques.Section III reports the methodology used to conduct this study.Section IV presents the empirical results.Finally, conclusions, managerial implications, limitations and further research are depicted.

A. RFM and LRFM Models
Recency, frequency and monetary (RFM) is an effective method of segmenting and it is likewise a behavioral analysis that can be employed for market segmentation [9], [10].Reference [9] describes that the main asset of the RFM method is, on the one hand, to obtain customers' behavioral analysis in order to group them into homogeneous clusters, and, on the other hand, to develop a marketing plan tailored to each specific market segment.RFM analysis improves the market segmentation by examining the when (recency), how often (frequency), and the money spent (monetary) in a particular item or service [11].Reference [11] summarized that customers who had bought most recently, most frequently, and had spent the most money would be much more likely to react to the future promotions.
The advantage of RFM model resides in its relevance as long as it operates on several variables which are all observable and objective.They are all available at the order's past for each customer.These variables are classified according to three independent criteria, namely recency, frequency and monetary [12].Recency is the time interval between the last purchase and a present time reference; a lower value corresponds to a higher probability that a customer will make a repeat purchase.Frequency is the number of transactions that a customer has made in a particular time period and monetary means the amount of money spent in this specified time period [13].
The traditional approach to adopt RFM model is to sort the customers' data via each variable of RFM and then divide them into five equal quintiles [2], [14].The process of segmentation begins with sorting all customers based on recency, then frequency and monetary.For recency, the customer database is sorted in an ascending order (most recent purchasers at the top).Customers are then sorted for frequency and monetary in a descending order (most frequently and had spent the most money were at the top).The customers are then split into quintiles (five equal groups), and given the top 20%segmentis assigned as a value of 5, the next 20% segment is coded as a value of 4, and so on.Therefore, all customers are represented by one of 125 RFM cells, namely, 555, 554, 553, . . ., 111 [15], [16].
Customers who have the most score are profitable.In this study, we adopt another approach proposed by [17], it consists of using the original data rather than the coded number.The definitions are as follows: Recency is the time interval between the first day of study period and the last purchase; frequency is the number of transactions that a customer has made in a particular time period and monetary means the amount of money spent in this specified time period.Some researchers try to develop new RFM models by adding some additional parameters to it so as to examine whether they achieve good results than the basic RFM model or not [18]- [20].For example, [19] selected targets for direct marketing from a database by extending RFM model to RFMTC, by adding two parameters, namely time since the first purchase (T) and churn probability (C).Another version was proposed by [21] Timely RFM (TRFM) model consists of adding one additional parameter, the period of product activity to determine the relationship of product properties and purchase periodicity i.e. to analyze different product demands at different moments.Chang and Tsay propose the LRFM model, by taking the customer relation length into account, in order to resolve RFM model problem related to the difficulty of distinguishing between customers, who have long-term or short-term relationships with the company [7].In addition, [22] suggests that the customer's loyalty and profitability depend on the relationship between a company and its customers.In this regard, in order to identify most loyal customers, it is necessary to consider the customer's relation length (L), where L is defined as the number of time periods (such as days) from the first purchase to the last purchase in the database.all customers into several clusters based on some similarities in these customers [23].Clustering techniques are used to identify a set of groups that both minimize within-group variation and maximize between-group variation according to a distance or dissimilarity function [24].

B. Cluster Analysis
The SOM (Self-Organizing Map) is an unsupervised neural network methodology, which needs only the input is used to clustering for problem solving [25] and market screening [26].The network is formed by an unsupervised competitive learning algorithm, which can detect for itself (which means that no human intervention is needed during the learning process) patterns, strong features, and correlation in the large input data and code them in the output [27].The patterns of SOM in a high-dimensional input space are originally very complicated.When projected on a graphical map display, its structure, after clustering, turns out to be not only understandable but more transparent as well [28].
K-means clustering is the most common algorithm used to cluster n vectors based on attributes into k partitions, where k < n, depending to some measures [29].The name comes from the fact that k clusters are identified and the center of a cluster is the mean of all vectors within this cluster.The algorithm starts with choosing k random initial centroids, then assigns vectors to the nearest centroid using Euclidean distance and recalculates the new centroids as means of the assigned data vectors.This process is repeated many times until vectors no longer altered clusters between iterations [30].The K-means method is arguably a non-hierarchical method.However, SOM has a few disadvantages.For example, with the result generated by SOM technique, it is difficult to detect clustering boundaries, a fact which limits their application to automatic knowledge [25].Furthermore, in the k-means technique, the number of clusters and the initial starting point are randomly selected, which means that the algorithm has to turn several times to identify strong forms, because the final result depends on the initial starting points (different initial k objects may produce different clustering results).Due to the weakness of SOM and k-means method, the integration of these methods becomes desirable.Reference [31] took this view, adopting a two-staged clustering method by integrating the hierarchical method into the non-hierarchical.
Kuo et al. [32] have pointed out that it is preferable to use iterative partitioning methods instead of the hierarchical methods if the initial centroid and number of clusters are provided.If the information is provided, the iterative method consistently finds better clusters and higher accuracy than the hierarchical methods and yields faster results because the initialization procedure that ultimately determines the number of iteration is already executed.One example proposed by [31] is to adopt a two-staged clustering method by deploying Ward's minimum variance method to obtain the number of clusters and also to provide the starting point.Then the nonhierarchical methods, like the k-means method can use the result of the Ward's minimum variance method to find the final cluster solution.On the other hand, [32] have proposed a modified two-stage method by applying self-organizing feature maps to determine the number of clusters for K-means method.The reason is that Self-Organizing Maps can converge very fast since it is a kind of learning algorithm that can continually update or reassign the observations to the closest cluster.Therefore, this study uses self-organizing feature maps to determine the number of clusters and the initial starting points that K-means method need.
In the first stage, data set is clustered via adopting the SOM.From the final output array, we can easily determine the candidate number of clusters as well as the initial centroid.In the second stage, the starting point and the derived approximation of the clusters (k) determined in the first stage are used with K-means method.Wei et al. [24] pointed out that self-organizing maps (SOM) and K-means method are commonly used for cluster analysis.

III. RESEARCH METHODOLOGY
In this section the proposed model to determine loyal and profitable customers is described.
The purpose of this case study is customer segmentation using LRFM model and clustering algorithms (SOM and Kmeans) to specify loyal and profitable customers for achieving maximum benefit and a win-win situation.
In order to identify most profitable customers, it is necessary to consider the ''mode of payment factor'' in the company.Fig. 1 shows the required steps for the proposed model.

A. Understanding Data
Dataset used in this case study was provided by a company selling online in Morocco and collected through its ecommerce website.
All of the transactions carried out by customers are stored in a MySQL database.From this database, we will design a data warehouse that contains a wide variety of products, descriptive information on each customer and transactional data.
The transactional data consist of 730 customers who have purchased the website of the company from November 2013 to January 2015.
Customers have four modes of payment: Cash on delivery, online credit card, bank transfer and payment in three installments.

B. Data Preparation for Segmentation
Data preprocessing is one of the most important and often time-consuming aspects of data mining project [33], [34].In this case study, data preprocessing techniques such as data selecting, data cleaning, data integration and data transformation were used to improving the quality of data clustering.
The purchase orders included many columns such as transaction id, product id, customer id, ordering date, item price, item quantity purchased, total amount of money spent, and payment modes.
While customer table includes the following fields such as customer id, gender, marital status, birth date, email, address, city; product table included attributes such as product id, barcode, brand, category subcategory, price and quantity.The customers in Cluster 7 have low L value ( ).The low L value indicates that these customers have not yet established a long relationship with the online store.In addition, despite the lower value of L, it is observed that the average value of R, F and M are above the overall average, which might indicate that these customers purchase recently and frequently with a high money spent.So, they could be the customers with profit potential in the near future.The company must develop an effective marketing strategy whose aim is to encourage the customers in this cluster to migrate to the Cluster 6, by encouraging them to continue performing their online purchases, by providing a marketing program adapted to their purchases and by sending emails focused on their needs (monthly newsletter that is informing customers about the latest news of special promotions or sending emails to these customers to wish them a "happy birthday" or "happy anniversary" of the date they became customers, Request of opinion and so on).This allows to build a solid relationship between the online store and customers, and is more likely to create loyal customers.
Cluster 4 includes the minimum number of customers (only 60 customers).They have low average length and relatively low average frequency, but the average recency and monetary are very high ( ).Even though they have made very few transactions they have managed to spend a very significant amount of money.
Table VI illustrates that the majority of these customers prefer to use payment in three installments (D) as a mode of payment, which has been recently proposed by the online store, and it is reserved exclusively for the products, whose prices exceed MAD 1500.This indicates that these customers usually buy expensive items.Cluster 4 is called big spender customers potentially loyal.
The online store might concentrate its efforts in maximizing the loyalty of these big spenders.Therefore, it should place a particular emphasis on the satisfaction of these customers because the customer satisfaction contributes to increase the customer loyalty [35], e.g. by offering these customers special services such as a special discount, providing targeted and personalized promotions according to their profiles and their previous purchases.In this way, the online store keeps these customers coming back for more purchases.
Cluster 8 has higher L value but lower R, F and M values compared to the overall average L, R, F and M values ( ).The customers in Cluster 8 belong to former customers, who show little interest in items and services provided by the online store.This lack of attention is determined by the low number of transactions made by these customers, the low value of recency and their small contribution to the company.They begin to lose contact with the online store because they have not been heard of for a long time.Perhaps these customers were not satisfied with the services and products which they have received, or they have been attracted by the competitors, or they have lost confidence in the merchant site.This cluster is called the lost customers.Therefore, the company should identify the reasons and solve problems quickly to bring back those customers.
For Cluster 1 and 2, the values of L and R are above the average values.They have the characteristics of high recency, and more importantly, longer relationships with the online store.It can be said that these two clusters are loyal, but they do not have the right profile to become profitable customers, because their contribution to the company is still low even though they have the longest relationship with the online store compared with other clusters.The only difference between the customers in Cluster 1 and the customers in Cluster 2 is that the former purchased more often.
Customers in these clusters might become more important if the online store transfers them to Cluster 6, which represents the best customers of the company.The best strategy to achieve this goal would be to set up psychological pricing techniques and a special discount, which have a great impact on purchasing decisions.Again, this strategy proves efficient in that it encourages customers to purchase more frequently and spend much more money.Among these techniques, we can mention the term "FREE" that attracts the attention of any customer, even the ones who were not planning on spending anything in the first place.Another "FREE" technique is offering free delivery on all purchases of MAD150or more, proposing special promotional offers such as 1+1=3.The last technique is "The charming #9" which serves to lower items' prices by one cent (prices ending in 9 ex.MAD 4,99) in order to boost sales.
A number of studies and experiments have confirmed this trend, for example, the experiments of winter clearance catalog of a direct-mail women's clothing retailer conducted by Drs.Robert Schindler and Thomas Kibarian in 1996 [36], the cheese experience was conducted in 2005 by Nicolas Gueguen and Odile Jacob [37] and the experience of pancakes with door-to-door still in 2005 was conducted by Nicolas Gueguen and Odile Jacob [37] in order to confirm the results of the previous experiments.All of these experiences confirm increased numbers of customers to 29,7% by just lowering prices by one cent (e.g., $5.99 vs. $6.00).Finally, Cluster 5 has high F and M values but low L and R values ( ), spending and number of transactions indicate that these customers are more frequent and spend enough money in a short period.They can be considered as profitable customers, but the very low value of R indicates that these customers have not purchased from the online store for a long period of time.They are dormant customers.Something went wrong with these customers, because they have shown a keen interest in the online store so early.Perhaps these customers will soon stop making purchases on the Website due to several reasons including: The customers no longer need products and services offered by the company, or they are dissatisfied with the poor quality of the produce.Therefore, the online store must maintain a close contact with these customers.The major marketing strategies for these customers is to come into contact with them by the creation of a customer reactivation program, e.g., providing exceptional promotional offers limited in time to establish a sense of urgency to trigger customer purchasing to restore the relationship with these customers and, therefore, to increase the retention rate.
To be responsive and in order to make the right decisions for the development of the activity of the online store, we used a tool called Microsoft Power Business Intelligence, which integrates a decision-making approach.This tool allows us to produce secured and interactive dashboards which provide marketing managers and sales directors to: consult regularly with the statistics and the reports, share the reports with the main actors (direction committee, marketing team, PDG, etc.) and announce the reports according to the period, with the intention of having a global vision and a high quality in terms of the performances of the merchant site.
A more detailed analysis regarding the length of the relation, recency, frequency, monetary, gender and the mode of payment for the nine clusters are reported in Fig. 3.
Fig. 3 is a report that contains multiple visualizations that energize clusters data.With the exception of customers in Cluster 3, the number of male customers is always larger than that of females.The majority of customers in Cluster 3, 9, 7 and 5 prefer to use payment by cash on delivery to pay for their purchases.Payment by credit card is the most popular mode used by Clusters 6, 1 and 2. The mode of payment by bank transfer is often used by customers in Cluster 8. Finally payment in three installments remains the preferred mode by the customers in Cluster 4. When examining the length (L) among different payment methods, the results have shown that the longer the relationship between the merchant site and the customers is; the more customers put their trust in the electronic system of payment proposed by the online store.This applies to Clusters 1, 2 and 6 wherein customers prefer to use the credit card such as the payment method.Moreover, clusters that have a low value of L such as Clusters 3, 5, 7 and 9 prefer to pay for their purchases using cash on delivery.Perhaps those customers that have recently joined the merchant site did not trust the electronic payment systems.
The payment method is one of the key indicators of a new index which allows assessing the level of customer confidence in the company's Website.The online store should, therefore, encourage customers to pay for their online purchases by the use of Moroccan or foreign cards in order to create a climate of trust with these customers, Because if the company manages to establish a link of trust with its customers, this will help it in promoting a sense of satisfaction and encouraging a long-term relationship [38], [39].
Fig. 4 shows how the visualizations belonging to the same report can be filtered, exploit other visualizations, and interact with them.For example, to visualize the results of Cluster 7, just click in the legend CLUSTER on Cluster 7. Therefore, the results for this Cluster are highlighted in the report and the rest of the results are dimmed.Another report illustrated in Fig. 5 provides more details about the nine clusters, including the total revenue, the relative value of each cluster in terms of total revenue and number of transactions, the value of the average basket per cluster and, finally, the number of transactions and revenue generated by payment method.

Fig
Fig. 7 Number o epresent the c (MAD 801 20 terms of the n epresent the c ransactions.idely used b ivery".It repr s to say, MAD ing at the tim of payment er CARD an 520 operations purchases w 9 076,20.The erms of impor presented 18,6 AD 459 460,42 ince the majo aces for custo