In a 2012 article, Harvard Business Review dubbed data scientist the sexiest job of the 21st Century. Its authors, American academics, DJ Patil and Thomas H. Davenport wrote: “If ‘sexy’ means having rare qualities that are much in demand, data scientists are already there. They are difficult and expensive to hire and, given the very competitive market for their services, difficult to retain.”
Considering that demand for data scientists is still booming, arguably, HBR’s assertion still holds true today — in January 2019, the online job site Indeed published a report which showed a 29% increase in demand for data scientists year over year and a 344% increase since 2013.
But there’s an issue: for all the data scientist enterprises are hiring, there’s really very little to show for it. Many data and analytics projects fail to go into production and a lot of enterprises are not seeing a return on investment.
Automating data science and machine learning for business insights
Speaking at the Gartner Data & Analytics Summit 2019, London, Nick Heudecker, VP Analyst, Gartner, stated: “Data scientists just remind me of a bunch of frustrated weather forecasters.”
He argued it’s becoming evident that the demand for analytics, data science and integration currently exceeds the capability of data and analytics leaders to provide data in a usable form, structure and assured content.
As the explosion in data collection points and data volumes only increase the demand for reconstituting data into usable forms to support analytics and data science, Heudecker believes a new role is emerging which could plug this gap in delivering data from experimentation to production: data engineer.
The problem with (not so sexy anymore) data scientists
There’s no doubt that a data scientist can bring value to an enterprise, their mathematical and analytical approaches to data can provide valuable insight-driven strategies. But data scientists, are practically spending almost half of their time just getting data ready for projects that won’t make it to production.
According to figures from Gartner, 47% of the time in data science is spent on data collection, preparation and problem analysis and not developing models; this perfectly highlights the problem: data science teams are being bogged down with issues they’re not hired to deal with.
The value of data: driving business innovation and acceleration
Going back to Heudecker’s frustrated weather forecaster analogy, it’s worth remembering that data scientists don’t actually build or maintain data infrastructures; if data infrastructures aren’t set up efficiently the work of a data scientist will be in vain.
“There’s a missing role,” argued Heudecker. “It’s no one’s job to consult with the business owners and IT to figure out what to do with data and figure out how it actually works.”
The reemergence of data engineers
For Heudecker, this is where data engineers come in.
“Early on in their lifecycle, we saw data engineers as being essentially chained to data scientists, and that’s not a great place to be chained,” explained Heudecker. “And so we saw a lot of burnout in data engineers, they really just got tired of just being order takers.”
“But despite this beginning, many data scientists, soon realised that they lack domain knowledge and, therefore, needed a partner with a lot of understanding around how the business actually uses its data.”
How augmented analytics tools will impact the enterprise
According to Heudecker, Netflix is a notable example. When Netflix wanted to figure out how they could make data a first-class asset within the company, they elevated the concept of data engineering into a stand-alone discipline.
“They had data stored in various cloud systems and on-prem,” he explained. “They wanted to bring all these together and tell a cohesive story around enabling everybody in the company with data, so they created a centre of excellence made up of senior data engineers that sit in Netflix environment and deliver the data.”
How data engineers add value
For Heudecker, data engineering, like any other kind of engineering, is the application of science and math to build things. Essentially, a data engineer’s task is to produce data for multiple consumers.
They are effectively enablers across the business for understanding any kind of data.
Data engineers are better equipped to build critical data pipelines, according to Huedecker. They deliver quality data infrastructure and focus on data integration, modelling, optimisation and quality. By doing this, they support data science teams and the wider organisation.
4 steps to building a successful data-driven organisation
Data engineers are also impacting applications in an operational context, not just analytics, but new micro-service architectures and evolving application for operational analytics as well.
Data engineers can build consumable APIs that could be leveraged across an organisation, hence they’re very code-centric. They’re collaborative with business owners and users of the end products to ensure that the right thing is being built and that these products don’t just languish as interesting science experiments.
Another advantage that enterprises have if they bring in data engineers, according to Heudecker, is that they can use them for internal marketing. Particularly, as they straddle both sides of the story; business and IT; they can use these data engineers to advertise new capabilities within the enterprise and this can give them a positive branding spin within the organisation around new features and capabilities that they’re offering.