With data comes responsibility

I’ve been reading quite a lot on the web on how to become a great data scientist. The most comprehensive resource for this question can perhaps be found on this Quora post: https://www.quora.com/How-can-I-become-a-data-scientist. Most of the answers focused on specific theoretical background (e.g. machine learning)  or the technical know-how (e.g. data mungling, feature engineering). I agree that these are all essential skills required for day-to-day work, but there are also other important qualities that we should acquire.

One example that came to mind today is the need to be aware that “with data comes (big) responsibility”.  This has to do with the fact that most analytical work is done independently in private. This starkly contrasts software development where collaboration will give an extra or multiple pairs of eyes on every single line of code that goes into production. Rigorous testing or verification can further reduce the amount of bugs in software. Data analysis is a different animal though. The goal of analysis is to find meaningful but unknown signals and patterns inside the data. Because the signals are unknown, it is hard to verify the findings of an analytical work. But since the data scientist is perceived as being most knowledgeable about the data, his conclusions will often be taken as the truth. Their conclusions then become “actionable insights” which are used to improve the business, for e.g. to re-design the interface of a website or an app. At big companies this could have a strong impact on the business operations and affects millions of customers. As such, an effective data scientist must become the owner of his data, question and verify every hypothesis raised for the data, and think hard about the implications of every conclusion drawn from his analysis.

One thought on “With data comes responsibility

Leave a comment