Sell your by-products, data science edition

From Re Work (chapter sell your by-products): Manufacturing companies must deal with their wastes; this often leads them to create by-products and sell them. Henry Ford built a charcoal plan to produce briskets from wood scraps left in the production of the model T. In non-manufacturing industries, potential by-products may not be readily noticeable. But, … Continue reading Sell your by-products, data science edition

Predicting rental listing interest – Kaggle competition

So I recently participated in a Kaggle competition (final ranking 103 / 2488). I had intended to play with the data for a bit and build a prototype / baseline model, but  ended up getting addicted and followed through till the end of the competition. It was such a fun experience that I thought I'd share with you … Continue reading Predicting rental listing interest – Kaggle competition

Data transformation, Scala collections, and SQL

Data transformation is one of the 3 steps in ETL (extract, transform, load) -- a process for getting raw data from heterogeneous sources (e.g. databases, text files),  process or transform, then loading it to the final source (e.g. in a format ready for further modelling or analysis). While there exist a plethora of languages for … Continue reading Data transformation, Scala collections, and SQL

Linux for Data Scientists

The first step towards becoming a data scientist is to become familiar with Linux. EdX offered a great introductory course by the Linux Foundation, which covers basic to intermediary materials. Important topics include: Linux philosophy and concepts Command line operators (basic operations and working with files) File operations User environment Text editors (vi/vim and emacs) … Continue reading Linux for Data Scientists