What is the essential skill a data scientist should have?Whether you want to pursue a career in data science or you have been looking to hire a data scientist, you have undoubtedly pondered about this question. Possible answers may include any of the following: Python, R, Hadoop, NoSQL, artificial intelligence, natural language processing, machine learning algorithms, statistics, math, data visualization, business acumen, storytelling – to name only a few. So… it is not really easy to declare a winner.
In this article, I will attempt to take a different path to answering this question – starting with the why question and working back toward the skills needed for success. Without spoiling it, let me just say that the answer is not contained in the above list.
WHY are data scientists hired?
Data scientists are not hired to use programming language X or database Y but to solve complex problems. This means finding a solution (Output) given a situation (Input). For the math enthusiasts, it can be written as: Output = F (Input). Data scientists’ mission is to find the function F.
WHAT kind of problems do data scientists solve?
In my experience, problems can be classified into three categories:
Type I problems: Clear Input, Clear Output. “Clear” here means there is nothing ambiguous about what is being given or asked for.
• Most of the problems we solved back in school are of this type.
• Examples: What is the average call duration in our service center? What is the monthly revenue across all business units?
Type II problems: Clear Output, Unclear Input.
• In their most usual form they are known as Fermi problems, the bread and butter of strategy consulting or Wall Street interviews. Google loves them too.
• Examples: How many people have seen our latest TV ad? How much money is spent on recruitment activities in Paris per year? The output is well-defined (number of people or Euros) but the input is unclear.
Type III problems: Unclear Output, (Un)clear Input
• This is the most common type found in a business setting
• Examples: Can you have a look at the mid-market segment and see if there is “anything” there? Should we focus more on online growth or offline efficiency?
If the problems you would be dealing with are mostly of Type III, then you will need to add an “art” component to the science of problem-solving.
HOW do data scientists solve problems?
Any insight a data scientist may produce comes by way of statistical analysis and will consequently be statistical in nature. For example, the statement “First-grade pupils are taller than kindergarten children”, while valid, will not hold for any pair of children. The notions of probability, uncertainty or confidence level are inherent to data science.
Now, for illustration, let’s assume your boss has asked you the Type III question given above:
“Should we focus on online growth or offline efficiency?”
How do you go about finding the solution? A possible path is outlined below:
♦ As the Output is unclear, the very first thing is to clarify it. What is your boss after? Is he expecting some high-level research to fuel a workshop on the topic? Or perhaps a few recommendations? Or a detailed action plan? Before you kick off an analysis, ask the right questions to have a clear image of what the Output should look like. If you have succeeded in doing this, then congratulations – you have just transformed your Type III problem into a Type I or Type II situation which is easier to crack.
♦ Let’s now assume the Output is linked to a profitability measure called X. The next step is to understand how X depends on various quantities (a.k.a. KPIs) related to the online and offline channels: these are your Input variables. Before you throw a correlation analysis at the problem, it may be useful to pause and verify (find experts and ask them questions!) that you have a good grasp on the people, processes and objects involved in the two types of transactions. In other words, make sure you understand your Inputs inside-out.
♦ Once you have selected a set of Inputs you can move on to mapping the Input-Output relationship, i.e. finding the function F. This usually starts with an analysis of the distributions of all Input and Output variables and proceeds through a series of questions: do the distributions make sense? Are the correlations you measure consistent with what you would have expected to be directionally true? Are there outliers, i.e. anomalous entries in these distributions? Do you understand the outliers? Etc.
This string of questions generated by your thought process is the essence of data science. It is sometimes called “scientific method”, “mathematical approach” or “logical reasoning” and it always comes down to the art of asking the questions that guide you to the answer. It is part education, part experience, part curiosity. Of the three, curiosity is the hardest one to acquire and arguably the key skill to data science.
The thought process – the art of asking questions – is the essence of data science. It is part education, part experience, and part curiosity. Of the three, curiosity is the hardest one to acquire, and is the magic ingredient to data science.
WHAT is the recipe for asking the right questions?
This question is relevant even beyond the realm of data science. Alphabet chairman Eric Schmidt puts curiosity in the top 2 qualities predicting success. But how do you learn to ask great questions?
There is no universal recipe I know of, but here are three tips to sharpen this skill:
♦ Play the game of asking yourself: in a world of unlimited resources (people or money) available to me, what would I really want to know? Visualizing that Output is a great technique to gain clarity in your thought process.
♦ Introduce a question-based approach to your communication activities. If you are writing a white paper, a blog piece or creating a PowerPoint presentation, start by announcing to the audience the questions you will be considering. List those questions on your first page or slide.
♦ Follow “curious” influencers on Linkedin or Twitter and read interesting thought pieces. Books such as Dataclysm, Freakonomics or The Signal and the Noise contain spectacular analyses built around a diverse array of great questions. For the fun side of data science, check out the FiveThirtyEight or OKCupid blogs.
A centuries-old quote says: “It is easier to judge the mind of a man by his questions rather than his answers.” The author was undoubtedly not thinking of data science but he might have had as well! The quote still rings true today. With automation rising at an unprecedented speed and machines getting smarter each day (think Artificial Intelligence) the ability to be curious about the world will continue to set us apart and see us thrive in the future.
YOU MAY ALSO LIKE
Over the past decade, the fashion industry has been shaken by the adventn
All companies are eager to send emails and notifications with the right messagen