Share with your friends










Submit

Analytics Magazine

Will the real data scientist please stand up?

How to navigate the crowded data scientist applicant landscape.

Nick PylypiwBy Nick Pylypiw

Data science has seen a dramatic rise in the last decade. The LinkedIn 2017 US Emerging Jobs Report revealed the two fastest growing jobs as “machine learning engineer” and “data scientist.” Universities are struggling to keep up with this trend, assembling new programs to address the growing need for data science professionals. Many schools are even offering accelerated, online programs to make the investment more manageable and appealing for potential candidates.

Meanwhile, companies of all shapes and sizes are competing for the attention of these newly popular “nerds,” offering large sums of money and flexible work schedules. With the university system churning out talent at breakneck speed, it would seem that the industry’s appetite for data scientists would be satiated. However, the gap between supply and demand seems to be growing.

One problem seems to be lack of quality, not quantity, but determining an individual’s data science qualifications is a very difficult task. This ambiguity allows anybody to put “data scientist” on their business card with little to no accountability. Though there are imposters in every field, the current bumper crop of data scientists seems to be littered with applicants citing only a few Kaggle submissions as evidence of technical experience. To be clear, Kaggle competitions are hotly contested affairs and routinely generate high-quality work. However, the problems are generally well-defined with relatively limited data sourcing and preparation. This leaves only the modeling piece which, at the risk of sounding blasphemous, is arguably the easiest part of the process.

In addition to the folks above, there are those who have been stretched by their employers into uncomfortable territory. As a way to cope with the difficulty and expense (and culture shift) associated with standing up a data science team, some companies choose to cut corners by rebranding their best business analysts and IT professionals as data scientists. Without the proper training and guidance, these makeshift brain trusts quickly find themselves out of their depth, resulting in missed deliverables, unanswered business questions and, ultimately, the loss of critical support from stakeholders. This fairly common and intentional practice of putting square pegs into round holes is not fair to the individuals and is rarely successful. Furthermore, it has the added effect of diluting the field, fostering further skepticism around the benefits that a properly structured data science team can provide.

Of course, the growing mountain of resumes is not all imposters. There are plenty of capable and talented data scientists in the pile as well. In addition to the hordes of newly minted data scientists emerging from university programs, there is a large group of statistical modelers who have been doing “data science” for longer than it has been called that. These analytics practitioners have the technical skills and business experience to be successful in any industry. However, they can be difficult to spot, partially due to the lack of the resume buzzwords that recruiting algorithms typically search for.

So what, exactly, does a real data scientist look like? How can you find a ruby in a mountain of rocks? Is there a Coupe de Ville hiding at the bottom of that Cracker Jack box? At Elicit, we have an approach that we believe in, called “Geek Nerd Suit.” This framework is integral to how we build our teams and approach our work. Aspiring data scientists are assessed on three core competencies: technology and data (Geek); insights and modeling (Nerd); and strategy and communication (Suit). In our experience, balance in these three pillars is directly correlated with success, though we typically see data scientists slightly over-index in the “Nerd” category.

Nerd

There is an ever-increasing number of software options that make machine learning extremely easy. Even outside of these point-click interfaces, the actual construction of a machine-learning model is incredibly simple, often a one-liner in R or Python. The ease of use can create an inflated sense of confidence in the results, and in the process itself, even prompting some companies to question why the data science skillset is needed at all.

There is real danger in this sort of “drive-thru” analytics, however, as the out-of-the-box models have limited options and flexibility. Anybody can copy a line of code from Stack Overflow and build a model. The things that happen before and after that one line of code are what makes a data scientist valuable. In addition to being able to know why a certain algorithm is appropriate for a specific business problem, a data scientist must be able to explain how the model works, as well as how to tailor the model specifications for the given use case. This requires some significant knowledge in mathematics and statistics. Without proper context, a data scientist could recommend business action based on egregious misinterpretations of a model’s output.

Suit

The technical skills discussed above are obviously crucial, but they are worthless without the ability to synthesize business needs and goals into questions that can be answered with data science. Business partners can be skeptical of machine learning, especially when the insights generated do not validate their existing views. Convincing them that the data-driven insights produced by the data science team are likely to be more reliable than anecdotes and the status quo can be challenging.

An effective data scientist understands how to weave the model insights into a story, allowing business partners to see the strategic benefit and, perhaps more importantly, feel comfortable contributing to the conversation. Without the necessary communication skills, a data scientist could be interpreted as condescending or dismissive.

Geek

Data is rarely collected, organized, stored and shaped in a model-ready state. In large organizations, the IT team is sometimes tasked with creating the appropriate tables for reporting and data science projects. Deliberately stepping through the processes of ticket creation, prioritization, delegation and ETL (extract, transform, load) creates delays that the business cannot afford.

These processes are important and necessary, of course, but not suited for the iterative work required to properly implement a machine-learning algorithm. As the team works toward a solution, they might pivot several times, deciding that weekly is a better aggregation than monthly, or that the marketing data is needed after all, or that there is another source of that account info that has fewer missing values.

Rather than rely on IT to build these tables over and over, many companies prefer to build a sandbox environment where the data science team can build their own tables, adding columns and sources as appropriate, before passing these plans to IT for production. This requires data scientists who are comfortable working with large and varied data sources, and who have the coding skills (generally SQL) needed for exploring and joining data in the environment. A poorly written SQL query can bring down a server and impact all of the other processes in the environment. Perhaps worse, the results returned from said poorly written query may be built on incorrect sources, joins and filters, essentially producing misleading data.

An interview is usually not enough time to screen for competence in these areas, so how can we make sure we are hiring quality data scientists, those with the skills to be successful? One method that some companies rely on is a mini project. If executed properly, this is an exercise that can tell you much about the capabilities of potential candidate.

Mini Project

A week or so before the interview, send the applicant a few data sets and a relatively vague business context. Ask them to prepare analysis relevant to the business question, as well as a 15-minute business presentation. During the conversation, use specific, probing questions to dig for the “Geek Nerd Suit” competencies. Here are the things to look for:

Strategy (Suit). What were the questions being answered and were they relevant to the overall business context provided? How was the solution organized?

Data preparation (Geek). What steps were taken to prepare the data? Were tables joined together or transposed? Were there any missing or anomalous observations? If so, how were these handled? Were new fields created from existing ones? Most importantly, why were these steps taken?

Modeling (Nerd). What algorithms/techniques were used? Were these supervised/unsupervised? Why were these particular techniques chosen? How is the output from these models interpreted?

Presentation (Suit). Are the insights accessible for someone without the technical vocabulary? How were the model/insight/data discussions handled? Were the findings clear?

Vision (Geek/Nerd/Suit). What are the next steps for this analysis? What other questions were born as a result of this work? What additional steps could improve the results? Are there additional data pieces that could strengthen the model?

A candidate who can demonstrate competence in the above areas is likely to be successful in your organization. Of course, mastery in all three areas isn’t a requirement, but it’s a strong indicator of the level of performance the individual would bring to a business setting.

Nick Pylypiw is a manager on the Data Science team at Elicit, LLC, a customer science and strategy consultancy that helps clients uncover latent insights about their customers, and apply those insights to business, marketing, product, loyalty, brand and customer experience strategy. The company’s Fortune 500 clients include Southwest Airlines, HomeAway, Fossil, GameStop, Sephora, BevMo! and Pier 1 Imports.

Related Posts

  • 100
    Data science has seen a dramatic rise in the last decade. The LinkedIn 2017 U.S. Emerging Jobs Report revealed the two fastest growing jobs as “machine learning engineer” and “data scientist.” Universities are struggling to keep up with this trend, assembling new programs to address the growing need for data…
    Tags: data, business, science
  • 87
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data, science
  • 85
    More than seven years ago, McKinsey & Company famously predicted that by 2018 there would be a shortage of 140,000-190,000 people with “deep analytical skills” (i.e., data scientists) in the United States. A year later, a 2012 article in the Harvard Business Review just as famously labeled data science “the…
    Tags: data, science, business
  • 83
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 82
    “Drive thy business or it will drive thee.” Benjamin Franklin offered this sage advice in the 18th century, but he left one key question unanswered: How? How do you successfully drive a business? More specifically, how do you develop the business strategy drivers that incite a business to grow and…
    Tags: data, business


Headlines

Fighting terrorists online: Identifying extremists before they post content

New research has found a way to identify extremists, such as those associated with the terrorist group ISIS, by monitoring their social media accounts, and can identify them even before they post threatening content. The research, “Finding Extremists in Online Social Networks,” which was recently published in the INFORMS journal Operations Research, was conducted by Tauhid Zaman of the MIT, Lt. Col. Christopher E. Marks of the U.S. Army and Jytte Klausen of Brandeis University. Read more →

Syrian conflict yields model for attrition dynamics in multilateral war

Based on their study of the Syrian Civil War that’s been raging since 2011, three researchers created a predictive model for multilateral war called the Lanchester multiduel. Unless there is a player so strong it can guarantee a win regardless of what others do, the likely outcome of multilateral war is a gradual stalemate that culminates in the mutual annihilation of all players, according to the model. Read more →

SAS, Samford University team up to generate sports analytics talent

Sports teams try to squeeze out every last bit of talent to gain a competitive advantage on the field. That’s also true in college athletic departments and professional team offices, where entire departments devoted to analyzing data hunt for sports analytics experts that can give them an edge in a game, in the stands and beyond. To create this talent, analytics company SAS will collaborate with the Samford University Center for Sports Analytics to support teaching, learning and research in all areas where analytics affects sports, including fan engagement, sponsorship, player tracking, sports medicine, sports media and operations. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden

OTHER EVENTS

Applied AI & Machine Learning | Comprehensive
Starts Oct. 29, 2018 (live online)


The Analytics Clinic
Citizen Data Scientists | Why Not DIY AI?
Nov. 8, 2018, 11 a.m. – 12:30 p.m.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.