Open source storage is an emerging phenomenon; data storage software that is developed in a public, collaborative manner under a license that permits the free use, distribution and modification of the source code.
Organisations are now dealing with a huge amount of data, petabytes-worth, and it all needs to be stored in manner that is flexible, accessible and secure, while allowing analytics and intelligence-driven solutions to gain actionable insights from it.
There are a three trends that have given rise to open source storage and Stephen Manley — chief technologist at Druva — has helped Information Age dissect the subject.
Extracting data from the mainframe to leverage innovation across the enterprise
1. Cloud architectures and open source storage
“Cloud architectures and open source storage have bounced off each other and grown based on each other,” said Manley.
He pointed to two of the most popular open source storage offerings, Ceph and Lustre. Both of them have a very different architecture than file systems before, such EXT4 or ZFS. One of the big differences with them is that they separate out the data storage into “a big book object” and they store the metadata in a separate database. And to Manley, this reflects that they’re building for a cloud architecture, “because cloud storage only started being anchored on object storage and then offers database services on top”, he explained.
This has led to open source storage providers recognising that the storage landscape has changed, as has building for the cloud.
The flip side is also true.
“If you look at AWS, for example, I believe the first thing it supported with the FSx initiative was Lustre. And to me that reflects one of the things that open source does so well. It is very good at focusing in on very specific problems, often very vertical market oriented.
“Lustre is very good high performance computing and the cloud providers looked at that and said because that’s such a popular offering, we should just enable this in our cloud rather than necessarily building something to directly compete with it.”
It is evident from this that open source is very good at; a.) finding those niches where there’s a very specific problem to be solved, b.) creating an architecture to solve it and c.) building an architecture that works on-prem and in the cloud.
2. Open source in the analytics space
Open source, increasingly, is influencing in analytics space.
The analytics space has evolved beyond things like Hadoop and MapReduce, which were very text oriented and big data lake centric, to this understanding that the world is shifting to what is termed small data sprawl. The proliferation of IoT, remote sites and offices, means that organisations want to process or analyse data remotely, while enriching that data with information from the centre.
With this change there have been much more vertical offerings that are integrating the analytics with the storage itself.
Manley explained: “Somebody doesn’t just want to store data for IoT. The point of IoT is that I’m processing and analysing, and we’re seeing a lot more integrated pipelines, of which storage becomes a component. And open source is by far the most popular way, whether you look at Spark or Elasticsearch, because they can evolve quickly and people can adjust them to meet the specific needs of their particular industry.”
A reliance on open source in enterprise: Necessary for digital transformation
3. Open source storage driving intelligence
As cloud storage gets better, providers have to keep innovating and open source can foster this required innovation. And, one of the big areas of innovation is putting intelligence into the storage.
In the past, intelligent storage methods meant taking snapshots of the stored data and replicating those fundamental features. But today, storage systems, again Ceph is a good example, separate the metadata. This is means it’s much easier to analyse the metadata for classification, analysis, detection of problems or to do compliance.
“Today, open source storage plays a role in adding higher value add, not just in terms of core services, but also in intelligence about the data that they’re that they’re holding for the customers,” expanded Manley.
Open source storage is a product of the small data sprawl and multi-cloud era. It’s flexibility and the innovation created off the back of the solutions facilitates analytics and allows organisations to gather intelligence from the data within their virtual storage units.