The 20% guide to a good single view of customer

All businesses revolve around customers and products / services offered to them. These days companies compete on the ability to accurately predict customer intents with respect to their products in order to best serve them. Examples of intents are the potential to purchase an item, cancel a monthly subscription to a service, or close an account. Such ability relies heavily on how well a firm knows its customers, and most firms will be benefited from having the so-called single view of customer. This view contains hundreds to thousands of attributes which enable detailed customer insights. In this post, I will provide some guides on how to quickly generate a good view of customer. My hope is that by following the recommendations here, it may take you only 20% of your total effort (for feature engineering) to attain 80% of the optimal performance when it comes to modeling.

The importance of feature engineering

It’s widely believed that feature engineering is one of the most effective techniques to improve the performance of machine learning models. Kaggle founder and CEO, Anthony Goldbloom, claimed that “Feature engineering wins competitions”, based on the history of thousands of competitions on their site. While this task depends on the specific application and is often considered more of an art than science, there are a few guiding practices that can be broadly applicable across a wide range of use-cases. This is motivated by the fact that there are three main entities (with respect to the goal of modeling) in most business contexts namely customer, product, and the company itself. As predictive models are often customer-centric, the single view should capture information about an individual customer, and how he or she interacts with the two remaining entities.

Based on the above motivation, the single view can be divided into the following three main categories.

Descriptive features

Descriptive features are about customer characteristics that are often captured at a fixed point in time. This includes information pertaining to a customer such as demographics (age, gender, marital status, employment status, residential address such as suburb or post code). Features related to how was the customer acquired can also be valuable. For example, which channel did they sign up (online vs. offline), whether they received any promotion, which day of the week.

Behavioural / transactional features

Unlike descriptive features, behaviour or transactional features are more dynamic and often computed over repeated transactions, over some period of time. Recurring transactions occur in telecommunication, banking, insurance, entertainment, and many other retail businesses. A transaction can provide granular information, for example transaction amount or price, time, the product / item itself and its category, which is  invaluable in helping us understand customer preferences. Transactional features are often created for three dimensions:

  1. Recency: When was the first time or last time a transaction took place?
  2. Frequency: How often do a customer purchase? How does that break down into different product categories?
  3. Monetary: What was the value of the transaction? Because this is a continuous (real-valued) measurement, and there are multiple transactions per customer, these features are aggregated. In other words, you compute the min, max, sum, average, median, standard deviation, and other summary statistics of all transaction amounts during the period under consideration.

It may be worth emphasising again that behavioural and transactional features are computed over a period of time, and customer behaviours at any time in their lifecycle can be analysed by varying the time window. So for example, you may look at these features in the first day, first week, first month, first 3 months, last 3 months, last month, last week, etc of a customer tenure depending on the nature of the purchasing cycle in your business. You can also compare recent behaviours to initial behaviours when they first joined to see how customers have evolved? Have they become more valuable or are they spending less with your firm?

Interactional features

Interactional features are similar to transactional features in the sense that they are both recurring, although the former does not involve financial matters. We consider two-way interactions initiated by either the company or the customers. Interactions can be direct, such as email marketing, SMS, customer calls or complaints. They can also be indirect, such as customer visiting a company’s web pages. Each interaction can be thought of as an event, which can be categorised according to business activities. Because they are computed over a period of time like behavioural features, we end up with attributes  representing event counts. To give a few examples, they can be the count of how many times a customer visited a particular web page in the last 30 days; how many times a customer called to complaint about a product or service; or how many times did the successfully reach the customer.

As before, these features allow us to measure the changes in the levels of interactions at different time periods. Such measurements indicate the degree of attachment and responsiveness of a customer, which can be a very useful feature when predicting future customer intents.

Based my personal experience, these three sets of features can be implemented quickly if you are already familiar with the data of your business. Coupling these features with robust modeling methods like random forest or boosted trees often result in a reasonably good initial model. For binary classification, I usually get AUC above 0.70 in the first run, which surpasses the accuracy level required for some practical applications.

Signing off note: In this post I’ve actually only described feature generation, a precursor to the whole feature engineering process. Often further processing or transformation of the generated features may be needed, for example by normalising, scaling, or discretising continuous variables, especially when using models that are sensitive to the magnitude of feature values.