In the era of big data, organisations want to milk every bit of value they can from all their data sources, but no one wants their business processes bogged down by bloated legacy data- so despite the range of new solutions available for storage, the old problems of balancing deletion and retention haven't gone away. Throw compliance regulation into the mix and organisations are faced with a hefty task around classifying it, establishing policies to archive it, securing its confidentiality and integrity, and correctly disposing of the chaff. According to Gartner, unstructured data is doubling every year, with less than 10% of the files in a data centre active.
Information Lifecycle Management (ILM) is the best practise answer to establishing the data governance framework around this. But many are having to ask if old ILM practises can still cut it when their data lives in a cloud.
As Andrew Tang, service director, security at data management service provider MTI explains, there is understandable confusion as to how to apply these methods in a cloud or hybrid cloud environment. Deciding whether to archive in the cloud or in traditional storage is the first question companies should ask.
'On the one hand, archiving actual, physical data is good as you have a definitive, secure way of holding valuable data,' says Tang. 'However this causes some problems should you wish to access the data further down the line; will you still have hardware that is capable of accessing such outdated data? Also, archiving data in physical storage can often be seen as an expensive moneypit, especially more so if the data is never accessed again.'
It's no wonder the cloud is emerging as a worthy option for archiving – retired data sapping internal IT resources does nothing to help a business grow, and the temptation to just stick it in a cloud and forget about it is a strong one. Solutions such as Amazon Glacier have emerged specifically for that purpose, allowing companies to keep their less frequently accessed data cheaply where there is leeway for slower retrieval times.
Transfer fee
However some clouds have a grey lining when it comes to managing costs, as Jeff Tabor, senior director of product management and marketing at edge core storage specialists Avere warns.
'Moving data between storage tiers of a single cloud provider or between storage tiers of multiple cloud providers can incur data transfer fees, especially if the data is moved at high bandwidth,' he says. 'Plus storage tiers in the virtualised/cloud environment are typically separated by substantial latency and the customer can incur financial costs moving between the tiers.'
> See also: Keys to the castle: encryption in the cloud
This latency typically results in data movement that is substantially slower than with traditional storage, which may not be an issue with simply storing inactive data but presents challenges for ILM. Some would argue that to properly implement ILM, at least three tiers are needed with a seamless way to move data between them.
Organisations should be 'working with a provider that has multiple tiers in its data storage solutions,' says Sean Jennings, SVP of solution architecure at IaaS vendor Virustream. 'You should utilise a high performance tier for data that is accessed routinely and where response times are material to accessing the data. Conversely, you want a lower performance tier to access older, less critical data where response time is not as significant and cost is the overriding consideration.'
'It is also helpful if your Content Management System (CMS) categorises data and can automatically migrate data between different tiers depending on relevance and frequency of access.'
The flexibility of any given cloud provider is key to smooth running ILM. With a flexible enough provider, cloud should offer companies a solution that is bespoke to the business, with the ability to manage the bandwidth they need in order to archive or retrieve it on their own terms.
'Having a ‘one size fits all’ approach certainly won’t work for many businesses, and especially if you are keeping historical data in the cloud you don’t want to be paying for access when you don’t need it,' advises Tang.
Horses for courses
Having the business understanding to know what solution would be suitable for each class of data, says Tang, 'is what differentiates a good cloud provider' and certainly one of the first questions a company should ask when entering into discussions.
'In terms of having to pick one, I think we’re on the cusp right now as more businesses turn to cloud storage, which offers a better degree of flexibility and management to store retired data.'
And for true cost effectiveness, saying goodbye to the 'one size fits all approach' also holds sway. Organisations might find benefits in keeping their options open regarding which cloud providers they use for their next set of historical data, giving them the bargaining power they need to get the best price, according to Tabor.
Aside from the cost, one of the most important considerations when choosing storage for archiving data is its privacy and sensitivity, and how protected it needs to be.
'Cloud-based data storage is an almost inevitable option for companies trying to manage a surge in data volume, variety and velocity but it can be easy to forget that all that information still ends up being stored in a physical location somewhere,' says Christian Toon, head of information risk at security firm Iron Mountain. 'Data centres are not infallible: they can suffer power outages, flood or fire, for example. In the worst-case scenario, this can lead to data corruption and loss. It is therefore vital to ensure that important information always has a secondary back up.'
Added to this, the nature of virtualised environments implies a high degree of portability of the underlying application and operating system.
According to Jennings, 'this introduces potential vectors for attack and or confiscation. In order to ensure the data in a virtualised environment remains private you must ensure that the data is encrypted at rest, in flight and even in use.'
A recent study by Iron Mountain found that 86% of business leaders in Europe believe they relinquish responsibility for the security of their data once it's store in the cloud, but EU law places accountability for lost or compromised data firmly in the hands on its owner.
> See also: The case for virtual disaster recovery
Ultimately, data remains the responsibility of its custodian, and IT departments take a risk when they think they can leave it in the cloud to gather dust without the proper security considerations, even if it's inactive. The type of data also presents a risk, depending on how valuable it is to the business.
As Tang points out: 'Understanding who ultimately holds the final responsibility for the data, in light of a breach, will give you a greater degree of control if you need to make a difficult decision.'
That being said, selecting a vendor in which the end user can have confidence, that has proactively sought security certifications, and is transparent in its security posture, will give IT departments the logs and governance information they need to properly lock down their data.
Seeing through the cloud
Transparency is also a key factor in assessing the performance of any cloud provider, as functionality of any ILM system will really only ever be as effective as the data centre environment it runs on top of. As Jay Prassl, VP marketing at SSD storage firm SolidFire explains, this means knowing the infrastructure of your cloud provider.
'Traditional models for ILM often rely on SSD storage for 'hot-data' and large spinning disk storage for data that is rarely accessed,' says Prassl. 'This architecture can be quite efficient when running in an environment with few applications alongside and low-intensity workloads, but when a large number of applications or high-intensity workloads exist in the same environment as your ILM system, you can start to run into 'noisy neighbourhood' issues- where one application starts choking the performance of others in the system.'
In a multi-tenant environment, this can be a huge problem as other companies’ workloads can have a knock-on effect on the performance of yours. To solve this problem, performance guarantees are vital.
'A storage system that enables cloud providers to strictly define the size and performance characteristics of individual storage volumes is essential in achieving this and is only made possible with all-flash scale-out storage solutions,' he argues.
> See also: The outside-in battle for the soul of the cloud
But whether they use cloud, physical storage, SSD or HHD storage for various datasets, moving data between tiers is a critical part of ILM for any organisation. Automated ILM software, argues Tabor, is no longer going to be a 'nice to have.'
'Old-fashion, manual methods of ILM may be 'good enough” for now, but for an organisation to remain competitive, moving to automated ILM tools will be a must,' he says.
Jennings agrees that ILM is definitely a worthwhile investment. Gartner states that data capacity on average in enterprises is growing at 40% to 60% year over year, and as data volumes continue to explode, businesses easily lose track of what they have in storage. This results in inefficiencies and costs.
'More importantly though, not all information should live forever,' says Jennings. 'There is data that you want to save, data you want to age for set time periods and data that you want to destroy immediately. This does not typically get carried out efficiently unless you have an ILM to monitor and automate the process.'