Sell your by-products, data science edition

From Re Work (chapter sell your by-products): Manufacturing companies must deal with their wastes; this often leads them to create by-products and sell them. Henry Ford built a charcoal plan to produce briskets from wood scraps left in the production of the model T. In non-manufacturing industries, potential by-products may not be readily noticeable. But, … Continue reading Sell your by-products, data science edition →

Tone is in your fingers – What is the best machine learning tool?

Photo by 42 North on Pexels.com From Re Work: When you play music, the quality of the melody that comes out depends largely on how skilled you are. It doesn't matter if you have the best guitar in the world, your music cannot be as good as that by a top guitarist. If you're not … Continue reading Tone is in your fingers – What is the best machine learning tool? →

Predicting rental listing interest – Kaggle competition

So I recently participated in a Kaggle competition (final ranking 103 / 2488). I had intended to play with the data for a bit and build a prototype / baseline model, but ended up getting addicted and followed through till the end of the competition. It was such a fun experience that I thought I'd share with you … Continue reading Predicting rental listing interest – Kaggle competition →

Google Analytics in BigQuery, explained in one query

Google Analytics (GA) is a popular suite of analytic tools used by many companies to track customer interactions on their digital channels. Although it offers plenty of built-in capabilities for insights discovery, there are times when you want to deep dive and run your own analyses. This post will help you understand the Google Analytics … Continue reading Google Analytics in BigQuery, explained in one query →

Recommended books for data scientists

In this post I'd like to share some of my recommended books for learning data science and machine learning, both in theory and and practice. Fellow practitioners, let me know your favourite books or any other related resources, I'd be keen to check out some new books and add them to my list. Theory These are … Continue reading Recommended books for data scientists →

Visualising thousands of customer journeys

Exploratory analysis of a dataset is a critical step at the beginning of any data science project. This often involves visualising the data, for example by plotting the data with histograms or box plots (for individual dimensions / features), by using a scatter plot (for pairs of features), or by looking at the correlation matrix. (Side note: … Continue reading Visualising thousands of customer journeys →

The objectives of customer segmentation

Customer segmentation is a practice widely used by companies to divide their customer base into sub-groups that share similar characteristics, and then deliver targeted, relevant messages to each group. Segmentation is done by looking at customer attributes such as demographic (e.g. age, gender, income, residential address) and / or their transactional patterns (e.g. RFM or … Continue reading The objectives of customer segmentation →

The 20% guide to a good single view of customer

All businesses revolve around customers and products / services offered to them. These days companies compete on the ability to accurately predict customer intents with respect to their products in order to best serve them. Examples of intents are the potential to purchase an item, cancel a monthly subscription to a service, or close an account. … Continue reading The 20% guide to a good single view of customer →

A brief study of hotels in Vietnam

This is a brief study I did over the weekend about hotels in three popular travel destinations in Vietnam namely Hanoi, Nha Trang, and Phu Quoc. Key observations: - Hanoi ranks first on the total hotel capacities, followed by Nha Trang, with Phu Quoc quite further behind. - 2-to-3 star hotels dominate the market in … Continue reading A brief study of hotels in Vietnam →

Data transformation, Scala collections, and SQL

Data transformation is one of the 3 steps in ETL (extract, transform, load) -- a process for getting raw data from heterogeneous sources (e.g. databases, text files), process or transform, then loading it to the final source (e.g. in a format ready for further modelling or analysis). While there exist a plethora of languages for … Continue reading Data transformation, Scala collections, and SQL →

	Hernán De Angelis on With data comes responsib…
	Sharleen Avinger on Pandora – the Internet…
	adrenalinedin on Game of numbers
	adrenalinedin on New release for IELTS Caf…