The effects of disaster – be they natural, ransomware or due to hardware failure – are far-ranging for businesses.
One example is downtime, which can lead to damage in customer trust and brand reputation as well as data loss. According to data from a 2023 LogicMonitor study, 96 per cent of decision makers and IT managers worldwide experienced at least one period of downtime, with many experiencing multiple instances in a single year.
So, how do you optimise your back-up and disaster recovery strategy to handle such incidents? We caught up with N-able‘s Chris Groot and Stefan Voss at the Empower Prague conference to find out.
What are the most common vulnerabilities that you see in businesses when it comes to their back-up and disaster recovery strategies? What are the gaps that they tend to have?
Chris: One of the biggest gaps that they have is the lack of time. Disaster takes technology, but it also takes people and process to do that. The ability to spend time on testing and having people in place – because there’s a labour shortage and events are pretty infrequent – I think that’s where the challenge really lies.
I would add the changing nature of the threat landscape – knowing what the next type of disaster is going to look like – it’s different than before. It used to be Exchange servers, Active Directory, that sort of thing. Now, it’s running Azure AD, Microsoft 365. So, what was best practice five years ago has now changed, and they need to be prepared, to be able to run the dress rehearsal. That experience is tough and takes time. That’s a challenge in any environment.
Stefan: The only thing I would add is that there are different types of disaster: natural disaster, flood, hardware failure, file corruption. But the one number one, time and time again, is how to restore from a cyber attack. Ransomware is the buzzword, but it could be all kinds of cyber attacks.
The difference between every other disaster and the cyber attack is really the fact that you’re going to be in an incident response, so you’re going to work with other people. Maybe you have a consulting firm that you partner with, but if you don’t even know who the people are, well, how are you going to work together when things go wrong? You’re going to have a real problem. Think about your objectives:
- How fast do you want to recover?
- How granular do you need the restore points to be?
- Where do you restore to when something happens?
You’re probably not going to want to restore ransomware back into production really quickly. You get incident recovery, but incident recovery of, what, malware? That’s definitely not what you want. So, you’ll probably need a sandbox.
Once you have that in place, you really need to practice, to do a real simulator. Then you’re going to be in a better spot, but for that you need time.
What are the advantages of using cloud-based solution as part of your full strategy?
Stefan: It’s greater agility, faster time to value. There’s another benefit that’s related to security. The back-up infrastructure, like your server that runs the back-up software, the back-up copies, could be storage, is on the same network as the server that the hackers are going to go after then you need to assume that they’re going to go after the back-ups as well. Why? If I can take out your back-up, well, then you’re you’re going to give me the money, it’s more likely I’m going to get the ransom. Having the back-up infrastructure in the cloud has the benefit of a smaller attack surface because it’s harder to get.
If you’re looking for a back-up and disaster recovery solution, what kind of questions should you be asking your provider?
Chris: So first of all, just in terms of disaster recovery – is it secure? And then it’s those time savings. Am I going to meet my needs in terms of being competitive in my space and automating as much as possible?
Stefan: Worry about the four Ts:
- Can I trust it?
- Can I test it
- Will it save me time? How much?
- Can I tie it to my environment?
Chris: The second piece of the equation is really about the time invested to support it. Will it scale? Is it multi-tenant? Can I work in these disparate environments that I serve? Does it more or less run by itself, or do I need to spend an hour or two hours a day making it operate correctly? That’s the difference. If you look at a lot of different software vendors out there, and you compare their list of features, they tend to look pretty similar. It’s not a question of what you do in terms of new features, but how well those features actually work. That then relates back to your time.
How often would you recommend a business tests its disaster recovery?
Stefan: I don’t think an annual disaster recovery drill is often enough. You do have to think about the need to get a server somewhere. When the real thing happens, you don’t want to be in a procurement, panic mode. I would really think about setting that environment up, somehow, thinking about the objectives, and then testing that runbook. You can only do that in the wild. Tabletop is nice, nothing against the tabletop, but you need to really have someone take something down. So maybe you do the tabletop or something like that every half year and a fuller disaster recovery drill. If you really want to, once a year, go in and really do some nasty stuff and see how you respond. It would probably be ideal, because certain things you can’t really simulate or replace with a tabletop, you really just got to go in and simulate actual examples without sacrificing production.
Chris: I’ll just try to summarise: there are different levels of recovery testing. There’s the stuff of will the server boot in a different environment? Does that mean all the data is there, everything’s good? That’s level one. Level two is going deeper, where you’re testing your people and process that wrap around that, because of course, a server booting doesn’t mean that everyone can connect to it and that business can actually continue. Level two is more like your once-every-six-months approach. I think that’s very pragmatic in terms of that lookout and I think that’s healthy.
Just batting the wrapper on that, who’s going to contact your insurance companies or cyber insurance? What will they allow us to do before we actually make our next move? But I think what we’re basically highlighting is how complex the answer is to what seems like a simple enough question of how often you should test.
What do insurers look for in a disaster recovery strategy?
Stefan: So first of all, the questionnaire generally goes through all the buckets. They’re almost always the same. It goes from prevention, to detection, to at some point, incident response and everything in between.
The questions are going to be very pointed to whether you have antivirus, you have devices or servers that you are not using, are you using antivirus versus endpoint detection and response (EDR)? Then there’s going to be preventative, and there’s going to be user related stuff, like, is everybody a super user? It’s not going to be that, but you get the idea. Are you applying multifactor authentication? Everything around authorisation, users, the privileges, will be a bucket.
Then there’s the disaster recovery bucket, which is wrapped in a broader incident response type thing. We’ll ask things like:
- Do you have an incident response plan?
- Is it a document?
- Is the security plan coordinated with the back-up or the disaster recovery plan?
- Is this disaster recovery runbook documented?
- Has it been tested?
- Is it right?
- Are the copies off-site? Or are they on the same network?
Chris: I would say a baseline requirement in our space, that is consistent on these questionnaires, is the copy of data. How frequently is it offset? That’s the primary key question. That may be slightly dated, but it’s still a key question. What I would say is it’s a very rapidly changing space, because payouts and complexities and attackers are doing new things.
So, what we thought was the right question a year ago is now becoming a more complex question. Every month, insurance have had to pay out that now, and again, suddenly, they realise the risk profile is not the same as we thought it was, and we need a better handle on the risk. The questionnaires are getting more and more serious as we go. So, in terms of our customers, the way we play into the requirements, in terms of what cyber insurers are looking for, our architecture, and our level of approach to security, is really aligned very well with what they’re looking for.
Stefan: If you don’t have multifactor authentication, that’s going to be a problem because anybody who harvests and finds the right credential can pretend they’re you, and then it’s a problem. Patching is a big bucket. Because when there are unpatched systems, you’re going to have lateral movement stuff. Then probing on whether you’ve tried the recovery once. Those are the big buckets.
More on back-up and disaster recovery
The importance of disaster recovery and backup in your cybersecurity strategy – A strong disaster recovery as-a-service (DRaaS) solution can prove the difference between success and failure when it comes to keeping data protected
Four tips to increase executive buy-in to disaster recovery – Dante Orsini, chief strategy officer at iland as part of 11:11 Systems, provides four tips to increase executive buy-in to disaster recovery
World Backup Day 2023: how businesses can mitigate a data disaster – On World Backup Day, we explore how organisations can optimise their backup and recovery strategies, to properly protect against data loss