The challenges of adopting Hadoop

Hadoop is poised to take Europe by storm – at least judging by the amount of money US venture capitalists are pouring into it.

Take Cloudera, for example. In December 2012, the Silicon Valley-based Hadoop specialist announced it had received a fifth round of funding, worth $65 million, a chunk of which was specifically earmarked for international expansion. A few months later, the company opened its new EMEA headquarters, located in the heart of London’s Tech City.

“Demand for Hadoop in the European market is exploding,” said CEO Mike Olson at the new office’s opening, “providing fertile ground from which to grow our business and support the proliferation of Apache Hadoop in enterprises across?the region.”

Cloudera’s competitors are also taking aim at Europe. In 2012, MapR opened offices in Windsor and Munich. Hortonworks, meanwhile, plans to open a UK office some time this year.

All three companies offer their own “enterprise” version of the open source Apache Hadoop framework, which distributes analytic workloads across large clusters of commodity servers.

The software itself is essentially free, and continually improved through the voluntary efforts of a vast online community of developers. But each firm’s distribution is frozen in time and packaged with tools and support services that give enterprise customers the confidence and capabilities they need to deploy it in production environments.

Muscling in

Meanwhile, the giants of IT are trying to get in on the act. In December last year, EMC and VMware announced a new spin- off company called Pivotal. Since then, it has launched its own distribution of Hadoop, Pivotal HD. And in April 2013, it unveiled Pivotal One, a platform-as-a-service (PaaS) cloud offering that combines this distribution with application infrastructure from VMware and hardware from EMC’s Greenplum.

IBM’s Infosphere BigInsights, meanwhile, combines Hadoop with the IT giant’s proprietary business intelligence technologies. It also recently announced BigSQL, which allows companies to query Hadoop through industry-standard SQL and SQL-based applications.

And in February 2013, chipmaker Intel announced its own Hadoop distribution, in a move intended to accelerate adoption of the platform by ensuring that Hadoop workloads run faster on Intel’s Xeon chips. Available in China for some time, it is now generally available in the US and will be rolled out in Europe this year, according to Alan Priestley, Intel’s director of strategic marketing in EMEA.

All three will clearly be eying Europe as a fertile market for their new Hadoop-based product sets.

But while there may be ample interest in Europe, potential customers still face two thorny challenges if they are going to introduce Hadoop-driven big data analytics into day-to-day operations: recruitment and procurement.

Skills shortage

The most testing of these challenges may well be recruitment, as Hadoop-related skills are in high demand and short supply.

A study by business intelligence vendor SAS found that, in the UK alone, demand for employees with Hadoop skills rose 210% during 2012.

Meanwhile, individuals conversant in key Hadoop technologies, such as MapReduce, Pig and Hive (see A Hadoop Glossary), find themselves in major demand, according to Neil Toms, a senior consultant at recruitment consultancy Harvey Nash.

“There’s already a real fight for talent going on here and, right now, experienced Hadoop developers can pretty much get whatever they ask for in the current market,” he says.

To get a feel for the balance of supply and demand in the market, Toms recently took a look at the Jobserve online recruitment site. “There were four new CVs posted by candidates with Hadoop skills in the last month – and 50 advertisements from employers looking to hire Hadoop contractors.”

These contractors easily charge anything between £500 and £650 for a day’s work, Toms reports.

“There’s no point me even talking about salaries, because employers will find it almost impossible to find anyone looking for a permanent role,” he says. “Why would they, when contracting is so lucrative and there’s a steady stream of available, well- paid work and always a bigger and more interesting project to move on to?”

Indeed, as one member of the UK Hadoop User Group told Information Age via an anonymous survey: “I get so many phone calls from recruiters, it’s not funny!”

However, other members reported that, in the UK at least, Hadoop adoption is still rather niche.

“[Adoption] looks to be quite high in the start-up and Internet areas, but there’s no real take-up in the mainstream sectors,” said one. “Lack of expertise and lack of solid case studies are probably the reason.”

The need for training and education is an opportunity that the Hadoop distribution companies are keen to exploit.

Cloudera, for example, has trained 15,000 people worldwide in range of big data topics through its ‘Cloudera University’ programme, says CEO Mike Olson, and issued 5,000 developers with technical accreditation.

In Europe, Cloudera is actively recruiting a network of training partners, he says. “We've taken an aggressive partnership approach to education, because we realise we need to educate as quickly and broadly as possible.”

Education is also an opportunity for smaller start-ups. Sixteen months ago, former PWC management consultant Mike Merritt Holmes co-founded the Big Data Partnership, a company that provides big data consulting services and Hadoop training courses.

?Right now, the business is still small, with just 12 people in its London offices, but it’s trained over 200 people across Europe including over 100 at CERN, the European Laboratory for Particle Physics and home of the Large Hadron Collider.

The Big Data Partnership offers accreditation in the Hortonworks and MapR distributions of Hadoop and partners with Microsoft for its HD Insights course, based on the software giant’s Cloud implementation on its Azure platform of the Hortonworks distribution.

It also runs one-day Hadoop and Big Data ‘masterclasses’ for non-technical business users. “A lot of this stuff really requires a different mindset from business executives and they need to be able to explore the ‘unknown unknowns’ that exist in the data they collect,” says Merritt Holmes.

The Big Data Partnership does not currently offer training or accreditation in Cloudera (arguably the most established of the three Hadoop distributors), but “I’m sure that’s coming soon,” says Merritt-Holmes.

Hadoop choices

A lot of companies are currently kicking the tyres of Hadoop. Merritt-Holmes says. “They’re trying to work out where Hadoop sits and what they might be able to do with it,” he says.

He predicts that once the first wave of experimentation is complete, more formal procurement exercises will kick in.

“It's taking a little while to happen and, in reality, we're probably another six months away before we see big companies getting business cases and budget signed off for large implementations,” Merritt-Holmes says. “But at that point it'll be a natural step to evaluate the distributions and see which one fits their needs best.”

That will be when the second big challenge kicks in: choosing between the three independent “enterprise” Hadoop distributions or opting for one the enterprise IT giants’ offerings.

Few buyers choose to work with the raw – but free of charge – open source code for Hadoop, says Merv Adrian, an analyst at IT market research company Gartner. That takes time, resources and expertise that few are willing to devote to the task, he say. It also demands a commitment to ongoing internal support that would involve evaluating new releases from the wider Hadoop community on a more or less constant basis before adding them to their production environment.

But while commercial distributions of Hadoop bring a level of pre-integration and support, they also have their challenges.

“Commercial distributions include different projects along with the core Hadoop projects, and no commercial distributions include or support all available projects,” he says.

In other words, just because a particular Hadoop enhancement has been built by the open source community that does not mean a Cloudera customer or a Hortonworks customer will necessarily benefit.

Furthermore, these independently developed niche enhancements may not integrate with each other easily. “Distributions also have varying release levels of the included projects and update them at different rates,” Adrian says.

“Thus, data management leaders run the risk of choosing a Hadoop solution that doesn’t meet enterprise needs.”

For that reason, he recommends working with one (or possibly two) of the commercial open source providers. “Buying an Apache Hadoop distribution is no different from buying any other software product,” Adrien says. “The enterprise will probably run the solution for several years at least, so data managers should make sure the vendor they choose can sustain a productive relationship.

“They should look at the vendor’s viability, support capabilities, partnerships and future plans for the technology. Above all, talk to reference customers.”

For all the vendors’ talk of a ‘fertile market’ and an ‘explosion’ of Hadoop adoption in Europe, there is still a rocky path for adopters to tread before Hadoop can be considered a mainstream element of any enterprise IT infrastructure.

The challenges of adopting Hadoop

Muscling in

Skills shortage

Hadoop choices

Related Topics

Related Stories

Charting the AI-fuelled evolution of embedded analytics

Data maturity and the squeezed middle – the challenge of going from good to great

How to stop data mesh turning into a data mess

Looking at the Earth with fresh eyes

Related Stories

Charting the AI-fuelled evolution of embedded analytics

Data maturity and the squeezed middle – the challenge of going from good to great

How to stop data mesh turning into a data mess

Observability – everything you need to know