Exploratory analysis of a dataset is a critical step at the beginning of any data science project. This often involves visualising the data, for example by plotting the data with histograms or box plots (for individual dimensions / features), by using a scatter plot (for pairs of features), or by looking at the correlation matrix. (Side note: in R, the corrplot package is great for graphical display of the correlation matrix, which is more effective than inspecting a matrix of numbers).
These visualisations are a great starting point, but they are designed for viewing static data. What if your data set is more dynamic? For example, each data point is a snapshot view of a customer at some point in time, and there are multiple data points corresponding to a ‘journey’ for each customer? You probably want to see how customer behaviours evolve over time. This is easy for a single customer as you can just plot a time series along the customer dimension that you care about. But most of the times you also want to look at the entire population to discern any trends or patterns which may allow you to serve your customers better. Enter heat map.
A heat map is a great way to simultaneously visualise multiple customer journeys on a same plot. In general, you use heat map when you want to plot values of a matrix, where the two axes corresponds to the rows and columns, and the values are mapped to a continuous colour scale. In our context of customer journeys, each row visualises a customer wherein the columns show the customer behaviours (e.g. transaction amount, number of transactions, etc) in the order of time. The colour of a cell indicates the strength of the behaviour.
The above plot shows a visualisation of a proportion of customers and their monthly spend values for each week since they first joined. The matrix is ordered by the total values in each row, i.e. by the total spend of a customer in the period of analysis. It’s immediately clear that there is a group of customers that only purchased in the first month and after that they’re gone. Note that the y-axis represents the row IDs and carries no meaning in this plot. Upon looking at this, one may decide to dig deeper into these sets of early churners to understand why and determine if there can be actions to improve their retention.
Hopefully this post has given you another tool to better understand your customers beyond the standard visualisation toolkits. The great thing about heat map is that they can be easily scaled, in the sense that once you organise your data according to some order, you can sweep through a part of the data that enabled by the graphical capability of your tool or machine, and still discern patterns in your data.