On its own, the label ‘geo-distribution’ sounds like something to do with Google Maps, or maybe digital photography, but, when it’s applied to the world of modern distributed cloud database architecture, it’s a meaningful phrase for an increasing number of developers. It signifies using powerful replication and data geo-partitioning capabilities to achieve low latency, high resilience and ensure the compliance your applications need.
To understand and achieve this, we need to work backwards, and identify the features an ideal cloud application should have — in summary, resilience, performance and compliance. It turns out that all three are of equal importance, though current discussions tend to focus on resilience. You ideally need all three, and must explicitly ensure you have them. Too many IT people innocently believe they get these features by default for simply being ‘in the cloud’ — this is far from true.
The first important feature is resilience. This means putting data in multiple places, so if you have a failure, the data is always available somewhere else. In many ways, this is the latest enterprise IT business continuity/disaster recovery progression. It is a way to ensure there is no single point of failure in an enterprise application. If Building A goes up in flames or fills with storm water, there’s a Building B (or in this case, a virtual data store) as backup.
In the cloud era, resilience is provided by ‘availability zones.’ Typically, your supplier will put all your data in two different zones, so if power fails in one, you’re still available in another. But, as the world gets more complex and we have to deal with issues such as political instability and the effects of climate change, you need to also consider ‘failure zones.’ These are the things that might go wrong! This could be your data centre, or an individual rack in the data centre, or if you’re in the cloud it could be an entire region, e.g., Amazon‘s Dublin data centre or Google‘s New York data hotel.
You need to decide the level of protection you are looking for, and choose a failure zone scale based on your requirements. For example, you might want to be protected from Amazon stopping operating in Ireland, or a data centre being unavailable because of a power cut. Once you’ve identified that failure zone, you need to build a database that is always available in more than one failure zone. Building a database across failure zones to ensure resilience is the future. Like it or not, your business continuity plan needs cloud resilience across more than one possible failure zone.
How to mitigate the impacts of an IT outage
From ‘available’ to ‘fail-able’?
At this point, I need to say why this isn’t given to you as an organisation by default in ‘the cloud’. Just because you put your database in the cloud, it doesn’t mean that it’s inherently resilient; you still need to do some work to ensure that each piece of your data is secured in more than one place, and that includes multi-cloud. You get lots of servers in one location, so your application is in the cloud, but you don’t automatically get a database spreading the data across different regions. Even today, many databases are local to one geo-location in the cloud, so you get application resilience, but you need to go the extra step to make sure it’s in more than one data centre. For example, a lot of organisations deploy in Amazon region X so if that Amazon region goes down, then their applications also go down. There have been many regional brownouts, and they’re increasing. There’s also the danger of bad actors and hacking as well.
The second feature of geo-distribution is performance. This means having the data close enough to your users that when they want to use it, there’s no perceptible lag or ceiling on what they can do with it. You want your website to be as responsive as possible. Having data close to your users is a good thing, having data a long way from them is not. In many business scenarios, there is increasing demand to minimise the latency (the time lag) in fetching data for your site. Great customer experience is often determined by how responsive your website is.
So, if you can put the right data in the right place with geo-distribution, you can offer a much better user experience on your website and transactional systems. If you’re running an online company globally, everything might go to one data centre in London or New York. As your business grows and you add more countries and business lines, the average travel time for the data also rises. In practical terms, it might mean going from a couple of milliseconds to the UK-based data centre, to 3, 4, 5, or even 600 milliseconds if it’s travelling across the Atlantic. This means delivery is increasingly slow, can cost you more, and makes for a poor user experience.
The final key feature of geo-distribution is compliance. This means having the data where you need it to meet legal requirements. Ideally, you’d really like to have a single database where the German customer data is only stored in Germany, the US customer data is only stored in the US, and the Asia-Pacific stuff is only stored in Asia-Pacific. If you can build one environment and still meet legal compliance requirements as they evolve, driven by data protection regulations around the world, you’re in cloud nirvana! Geo-distribution is emerging as the easiest and most performant way of achieving this.
Built-in support for multiple locations
This idea may seem a bit abstract, so let’s make it more concrete. A company we work with runs COVID testing for members of a specific community that meet at different locations round the world. All three key features of geo-distribution are important here:
- It needs to be resilient so if there’s a problem in one area, such as the Middle East, the data is always safe somewhere else.
- You want the results back quickly, so it must be fast.
- Some data must be repatriated because of EU and British compliance requirements.
I hope this has clearly illustrated why geo-distribution is so useful and important. But, it still leaves some questions; how do you build a database, or combination of databases, with built-in support for multiple locations; close to the people who will need to be using it; and that only ever stores the data where it’s legally required?
In a nutshell, that’s the high-level distributed database challenge. Frankly, it’s not a challenge everyone ‘gets’ yet. Developers just want a database they can easily use; they generally don’t care where it is. Management, the CIO, the CSO, the chief security officer, and so on, are often the people driving adoption of geo-distribution capability in their cloud databases. That’s due to the good reasons we’ve outlined, but also because there are some transactional database use cases that really take advantage of geo-distribution. These include vehicle telemetry, stock bids and asks, shipment information and credit card transactions, which many banks would love to see finally coming off their mainframes.
So, while ‘geo-distribution’ is not the greatest name the IT industry has come up with, it is nonetheless one of the most interesting and useful concepts coming out of the race to the cloud. I recommend you explore how this would benefit your business now and in the future.