Big data in the cloud – where next?

Big data is to the information age what the steam engine was to the industrial revolution. From expanding user intelligence to improving operational efficiencies, big data has revolutionised market places.

IDC predicts that by 2020 we will create 44 zettabytes, or 44 trillion gigabytes, of data annually. To transform big data into information that drives business value it needs to be analysed. Unsurprisingly, crunching big data requires seriously big compute and a solid infrastructure to support it.

The data boom

Our insatiable desire for data insights did not begin with big data. If we look back to the 90s tech boom, business intelligence (BI) tools were the new big thing. BI allowed organisations to report and analyse company data, but unfortunately these systems were confined to dedicated data warehouses running specialist servers. As a result, conducting BI was too expensive and technical for the majority of enterprises.

>See also: Beneath big data: building the unbreakable

The internet created a data boom which further propelled the cost of storing and processing data. In 1997, the term ‘big data’ was used for the first time, as NASA researchers claimed the rise of big data was an issue for current computer systems.

Hadoop was created in 2006 out of the necessity for new systems to handle the explosion of data from the web. Hadoop disrupted the market by enabling large data sets to be analysed on commodity hardware, delivering significant cost savings and scalability to businesses.

Having this ability to run data analytics on commodity hardware, coupled with the fact Hadoop is open source, levelled the playing field and allowed more organisations to start applying analytics to their data. Yet even though more companies could use commodity servers, as big data demands grew, so did capacity demands, creating a new set of big data challenges.

Cloud control

The rise of cloud has been as transformational as the evolution of big data, so combining the two technologies presents a logical and powerful proposition. However, despite the clear benefits of scalability and reduced capex investment, it was not until 2010 that the first cloud-based solutions appeared.

Now they are being adopted with gusto. The cloud diminished the significant upfront investments required for data centres and enables companies to pay for IT infrastructure as required.

Cloud computing offers a cost-effective way to support big data technologies and the analytics that drive business value. As well as delivering reduced overheads, through the automation of components required for big data execution, cloud servers could also be scaled out on processing requirements, facilitating big data deployments.

Big data processing often requires huge compute power for brief instances and consequently requires rapid server, known as virtual machine (VM), provisioning. It is unsurprising therefore that IDC predicted that cloud infrastructure would be the fastest growing sub-segment of the big data market, with a compound annual growth rate of nearly 50% between 2013 and 2017.

Not all clouds are created equal

While the cloud has enabled lower management and costs that have helped to make big data more efficient and effective, there are still a number of inefficiencies creeping through.

While VMs in the cloud can quickly be setup, they require constant monitoring to do so. As a result, during the processing of big data, the amount of compute required needs to be estimated and VMs established accordingly. Compute demand can be difficult to predict and as a result big data projects can often be under or over provisioned.

If a company’s estimations are out, VMs can overload and crash. As a result, companies usually opt on the side of caution and provision more VMs than they are likely to use. Therefore, traditional cloud big data deployments still tend to waste money.

Research has shown that at best VMs are 50% utilised by typical workloads. This means that customers running servers for testing and development could save a minimum of 50% through using auto-scaling containers, rather than fixed-size VMs.

Innovative containerised servers, or containers, instantly scale with load. Unlike other cloud servers, they fluidly scale to exactly the size necessary to deal with current load. As a result, no manual admin work or complex tools are needed to scale servers, saving resources and promoting usability.

>See also: Capitalising on the power of big data

Furthermore, as containers are auto-scaling, their usage is monitored and can enable utility-style, usage-based billing. This means that businesses only pay for what they use, rather than rigid capacity increments – so companies can scale as needed without having to worry about the cost of over-provisioning.

Containers are more suited to big data deployments as they flexibly scale to load and enable usage-based billing. saving significant costs for businesses. In light of this, containers are rightly heralded as the latest technology to harness the big data deployments of the future.

 

Sourced from Richard Davies, CEO, ElasticHosts

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics

Big Data
Data