Sell your by-products, data science edition

From Re Work (chapter sell your by-products): Manufacturing companies must deal with their wastes; this often leads them to create by-products and sell them. Henry Ford built a charcoal plan to produce briskets from wood scraps left in the production of the model T. In non-manufacturing industries, potential by-products may not be readily noticeable. But, … Continue reading Sell your by-products, data science edition

Tone is in your fingers – What is the best machine learning tool?

Photo by 42 North on Pexels.com From Re Work: When you play music, the quality of the melody that comes out depends largely on how skilled you are. It doesn't matter if you have the best guitar in the world, your music cannot be as good as that by a top guitarist. If you're not … Continue reading Tone is in your fingers – What is the best machine learning tool?

Predicting rental listing interest – Kaggle competition

So I recently participated in a Kaggle competition (final ranking 103 / 2488). I had intended to play with the data for a bit and build a prototype / baseline model, but  ended up getting addicted and followed through till the end of the competition. It was such a fun experience that I thought I'd share with you … Continue reading Predicting rental listing interest – Kaggle competition

Google Analytics in BigQuery, explained in one query

Google Analytics (GA) is a popular suite of analytic tools used by many companies to track customer interactions on their digital channels. Although it offers plenty of built-in capabilities for insights discovery, there are times when you want to deep dive and run your own analyses. This post will help you understand the Google Analytics … Continue reading Google Analytics in BigQuery, explained in one query

Visualising thousands of customer journeys

Exploratory analysis of a dataset is a critical step at the beginning of any data science project. This often involves visualising the data, for example by plotting the data with histograms or box plots (for individual dimensions / features), by using a scatter plot (for pairs of features), or by looking at the correlation matrix. (Side note: … Continue reading Visualising thousands of customer journeys

The objectives of customer segmentation

Customer segmentation is a practice widely used by companies to divide their customer base into sub-groups that share similar characteristics, and then deliver targeted, relevant messages to each group. Segmentation is done by looking at customer attributes such as demographic (e.g. age, gender, income, residential address) and / or their transactional patterns (e.g. RFM or … Continue reading The objectives of customer segmentation

The 20% guide to a good single view of customer

All businesses revolve around customers and products / services offered to them. These days companies compete on the ability to accurately predict customer intents with respect to their products in order to best serve them. Examples of intents are the potential to purchase an item, cancel a monthly subscription to a service, or close an account. … Continue reading The 20% guide to a good single view of customer

Data transformation, Scala collections, and SQL

Data transformation is one of the 3 steps in ETL (extract, transform, load) -- a process for getting raw data from heterogeneous sources (e.g. databases, text files),  process or transform, then loading it to the final source (e.g. in a format ready for further modelling or analysis). While there exist a plethora of languages for … Continue reading Data transformation, Scala collections, and SQL