Earlier this summer, I mentored a college student doing her first data science internship. She was thrilled to join a large government agency and put her classroom knowledge to work. But when I asked over coffee how it was going, she glumly admitted that it wasn’t what she hoped for.

“I thought I’d do more modeling and visualizations,” she said. “Instead, I’ve spent most of my summer cleaning data within spreadsheets.”

This SOS is common, even for seasoned data scientists.

Data scientist dilemma

A 2016 survey by CrowdFlower found that data scientists spent 60% of their time cleaning and organizing data. Of those same respondents, 57% said that cleaning and organizing data is the least enjoyable part of their work.

Three years later, these industries woes haven’t wavered much. It’s true that data science can solve exciting problems, from curing cancer to help Starbucks recommend the right drinks.

But business leaders often do what’s called a “data dump.” They hand a range of spreadsheets with untagged, unstructured data to their newly hired data scientists and expect them to work magic. In doing so, they outsource the hard work of knowing why they need data scientists at all.

I understand how and why this happens. Broadly speaking, data scientists don’t need to apply for jobs: Demand for them far exceeds the number of them.

Allen Blue, who co-founded LinkedIn, told Knowledge@Wharton that data science-based jobs saw “massive growth — 15 times, 20 times growth” over the past three years.

What’s driving this demand? A growing gap between business needs and talent pipelines.

Mo’ data, mo’ problems

The rise of internet usage and internet-connected devices left a huge amount of data in its wake. But data itself has no inherent value.

By contrast, people with the skills needed to assess it have a huge amount of value—which is why they come with salaries higher than what many businesses can afford.

Without colleagues who have the skills to clean, model, analyze, and present the most relevant data, business leaders can’t use it for their benefit. So, when a data scientist does come aboard, there’s pressure to outsource everything to them.

But here’s what data scientists can’t do: Dictate your business strategy.

When leaders hire data scientists without giving them a clean slate (i.e. specific data that has been cleaned and mapped back to the business strategy), they don’t set their new hires up for success. It’s a waste of money for businesses, and demoralizing for data scientists—especially when the role requires such advanced levels of education.

Before you hire your first data scientist…

Ask yourself why your business needs one, and how that person will drive your strategy forward. If you find your answer, the next step is to confirm that you have enough clean, labeled data to get started before you bring a new employee through your doors.

Like AI, data science suffers from inflated expectations. Gartner’s latest Hype Cycle for Data Science and Machine Learning shows several technologies—including Spark, Python, and augmented analytics—in the Trough of Disillusionment.

This suggests that businesses aren’t seeing the value they’re expecting from data science. In many cases, it’s their own fault for leapfrogging ahead of the hard strategic work.

Software vendors and media outlets alike often speak of data science as an entity that offers standalone value. But as any data scientist will tell you, doing it right demands a lot of work. It also requires enough clean data and a strong tech stack for whomever you hire to get started.

Luckily, this doesn’t have to be so hard. Gartner predicts that by 2020, a large number of data science tasks will be automated—including those that can collect and tag data. Before bringing a data science in-house, make sure you’re using the right tools to set them—and your strategy—up for long-term success.

Browse data science software to clean and tag your training data

Share This

Share this post with your friends!