According to the American Trucking Association, over 10.5bn tons of freight are moved every year across the US by some 3.6m trucks and drivers.
What most people don’t realise is that along with canned soup, paper towel, and the myriad of other goods that power the US economy, these trucks carry literally petabytes of data.
It’s true. In the era of 5G mobile networks and Fiber-to-the-Home Internet connectivity, two of the biggest players in cloud services – Amazon and Microsoft – have launched large-scale physical data transfer services. Amazon’s Snowmobile service even boasts “…dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an optional escort security vehicle while in transit.” Microsoft’s Azure Data Box Heavy is a “ruggedized, self-contained device is designed to lift 1 PB of data to the cloud.”
It turns out that the physical movement of massive quantities of data to the cloud and elsewhere is a thing in industry, science and elsewhere. MIT’s recent groundbreaking image of a black hole was produced using data collected from sites worldwide and then shipped via air to MIT for analysis. The reason? Transfer times at Internet speeds for the quantities of data in question would have been over 25 years.
For the moment, we’ll leave the implications of the apparently vast hole in data transfer technology aside. The more pressing issue here is what happens when physically-moved big data gets to its destination, and what happens while it’s in transit?
It’s data, not your kitchen dishes
When I moved several years ago, I packed up my kitchen dishes and didn’t see them until they arrived with the movers at my new home. They were unchanged (and thankfully unbroken) when they arrived, and my use of them resumed as if I hadn’t been eating from paper plates in their absence.
Data doesn’t work like this
While the model of one-off physical data movement may work for scientific projects like MIT’s – for businesses, the strategy of shipping truckloads of unstructured data from on-prem data lakes to the cloud is problematic. The reason? Data is not a static factor. It is in constant change. And business doesn’t stop while the truck is on the road. While the driver is eating breakfast, the business whose data he’s moving may have changed so dramatically that the data is barely relevant. At the very least, the context of the data may have evolved such that achieving consistency once the data is safely delivered is next to impossible.
Globally, the vast majority of data in existence has been produced in recent years. That means that what’s being shipped physically is not legacy data being sent for cold storage. It’s the lifeblood of the businesses that are shipping it, and – because this data is not kitchen dishes – its business value may significantly diminish in transit.
The five commandments of big data cloud migration
What’s more, data environments are diverse and heterogeneous – not monolithic. Moving data from data lakes generally involves moving to new standards. And conversion of unstructured data at the data destination can be tricky – for example when migrating on-premise distributed storage systems like Hadoop files and Object files to cloud-based storage systems or migrating other services such as Hive.
Finally, consider that most big data is globally distributed. Are there enough trucks, in enough locations, with enough capacity and travel time to move all relevant
data for a given business?
The bottom line
Trucks work best for consumer goods. Massive and expensive truck-based data dumps are not an effective replacement for a proactive, advanced cloud migration strategy. Big data stakeholders need to understand that today’s technology enables us to adopt a more nuanced, sophisticated and – most importantly -non-blocking cloud migration approach.
Making a business’s data cloud-ready
Where should you store your data? Cloud vs on premise the argument. Read here
Rather than one asynchronous dump, IT players need to aim for a single-pass migration with guaranteed consistency, minimal disruption to operations and maximum target ingest rate – bringing every path live as migrated. Companies can then look to transition to a multi-cloud data environment to take advantage of different clouds’ unique or best-performing services, and avoid being locked into one cloud vendor agreement while ensuring
data consistency, reliability and security.
By adopting tools and strategies that ensure that no data is left behind during cloud migration, we can let the trucker go back to moving dry goods, and let data flow without stoplights.
Written by Jagane Sundar, Chief Technology Officer, WANdisco