Exploratory analysis of a dataset is a critical step at the beginning of any data science project. This often involves visualising the data, for example by plotting the data with histograms or box plots (for individual dimensions / features), by using a scatter plot (for pairs of features), or by looking at the correlation matrix. (Side note: in R, the corrplot package is great for graphical display of the correlation matrix, which is more effective than inspecting a matrix of numbers).
These visualisations are a great starting point, but they are designed for viewing static data. What if your data set is more dynamic? For example, each data point is a snapshot view of a customer at some point in time, and there are multiple data points corresponding to a ‘journey’ for each customer? You probably want to see how customer behaviours evolve over time. This is easy for a single customer as you can just plot a time series along the customer dimension that you care about. But most of the times you also want to look at the entire population to discern any trends or patterns which may allow you to serve your customers better. Enter heat map.
A heat map is a great way to simultaneously visualise multiple customer journeys on a same plot. In general, you use heat map when you want to plot values of a matrix, where the two axes corresponds to the rows and columns, and the values are mapped to a continuous colour scale. In our context of customer journeys, each row visualises a customer wherein the columns show the customer behaviours (e.g. transaction amount, number of transactions, etc) in the order of time. The colour of a cell indicates the strength of the behaviour.
The above plot shows a visualisation of a proportion of customers and their monthly spend values for each week since they first joined. The matrix is ordered by the total values in each row, i.e. by the total spend of a customer in the period of analysis. It’s immediately clear that there is a group of customers that only purchased in the first month and after that they’re gone. Note that the y-axis represents the row IDs and carries no meaning in this plot. Upon looking at this, one may decide to dig deeper into these sets of early churners to understand why and determine if there can be actions to improve their retention.
Hopefully this post has given you another tool to better understand your customers beyond the standard visualisation toolkits. The great thing about heat map is that they can be easily scaled, in the sense that once you organise your data according to some order, you can sweep through a part of the data that enabled by the graphical capability of your tool or machine, and still discern patterns in your data.
Customer segmentation is a practice widely used by companies to divide their customer base into sub-groups that share similar characteristics, and then deliver targeted, relevant messages to each group. Segmentation is done by looking at customer attributes such as demographic (e.g. age, gender, income, residential address) and / or their transactional patterns (e.g. RFM or recency, frequency, and monetary value of their transactions). One key challenge often encountered when doing this is how to measure the goodness of your segmentation?
Qualitative and mathematical objectives
A commonly agreed, qualitative objective for a good segmentation(or clustering, as referred to in machine learning) is that similar customers should be in a same group and different customers should be in separate groups. This criteria can be inspected visually if your data has low dimensions (typically less than 4), like in the below figure(image source: http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/mvoget/cluster/kmeans_diagram.png). There we see two distinct colored clusters, each with a point at the centre called the cluster centroid. If each data point corresponds to a customer, the centroid can be thought of as the most representative member of a group.
If we know the representatives from each group then a natural segmentation mechanism is to find a representative most similar to a customer and assign him or her to the same group. This idea is utilised in the popular k-means clustering method, which has the objective of minimising the sum of total differences between customers in a group, across all groups. So one convenient way to evaluate the quality of a segmentation, for example when considering the number of segments to use (let’s call this k), is to compute the different objectives when varying k and choose the one with the smallest total difference. The disadvantage of this approach though is that this mathematical objective may not align with your business strategy, and the solution may look like a black-box to thus making the resulting segmentation not actionable.
Segmentation with a business objective
Instead of performing at customer segmentation purely from an optimisation perspective, it is important to tie it to your business objective and make sure that you have design a specific strategy for each of the final segments. For example, in one project we looked at customer behaviours in a short period after their acquisition and divide them into two segments: high-value and mass. The high-value segment contains only 15% of the new customers but accounts for 70% of the future life-time value. This allows the business to create two bespoke customer journeys and allocates more resources to retain the more valuable customers. In this case segmentation is determined by maximising the number of high-value customers that can be served, given the available budget for this segment. This indeed is still a constrained optimisation, but it is driven by a business objective and thus is better for execution with marketing campaigns.