Reducing the impact of IT outages

Digital transformation is well underway in many organisations. IDC forecasts that businesses worldwide will spend nearly $2 trillion on digitalisation projects by 2022. As the adoption of new technologies continues at pace, more and more customer touchpoints are becoming digital – and their availability is increasingly mission critical. Customers expect a seamless, efficient and reliable experience on all channels, at all times.

However, keeping services running smoothly is not always an easy task, as the major IT outages experienced by British Airways, Target, Facebook and Twitter in 2019 showed. A recent IT Outage Impact Study commissioned by LogicMonitor, the leading SaaS provider of unified IT infrastructure monitoring, examined just what it takes to keep the lights on and services running.

Air Malta’s digital transformation: turning being small to an advantage

Air Malta competes in a cutthroat business, where economies of scale are massive. So how does this tiny airline stay competitive? Alan Talbot tells Information Age how Air Malta’s digital transformation programme gave it an edge. But first, it needed a success. Read here

The independent survey of 300 IT decision makers in the UK, US, Canada, Australia and New Zealand investigated how organisations approach the mammoth task of detecting, mitigating and preventing IT issues. It brought to light a reality that’s in stark contrast to the image of the brave new digital world: Businesses are concerned about their ability to avoid costly outages, mitigate downtime, and reliably provide the service availability that customers and partners demand.

Combating costly downtime

Overall, survey respondents agreed that availability and performance are their top priorities, ahead of security and cost. But although IT teams seem to be intensely focused on keeping their networks running at peak performance, they are still not able to prevent downtime, with 96% of the surveyed businesses admitting that they experienced at least one IT outage in the past three years.

Surprisingly, respondents also reported that more than half of the downtime could have been prevented. Among the most common causes were network and infrastructure failures, human error, surges in usage, and software malfunction – some of which could have been detected and dealt with before they affected service quality.

Asked how confident they were in their ability to prevent future outages, IT decision makers had a pessimistic outlook, with more than half (53%) expecting to experience a brownout or outage so severe that it would make national media headlines. The same proportion of respondents was worried that someone within their organisation could lose his or her job as a result of a severe outage.

Negative press coverage, damage to an organisation’s reputation and possibly severe career implications aside, downtime is also costly. A drop in productivity, lost revenue and compliance-related costs were all cited by the survey respondents as costs associated with both IT outages and brownouts – defined as periods of dramatically reduced or slowed service. These costs can add up quickly. On average, organisations with recurrent outages and brownouts experience 16 times higher costs associated with mitigating downtime than organisations with few or zero outages. In addition, nearly twice the number of team members and double the time is required to troubleshoot downtime-related problems.

How to improve availability

If more than half of the outages and brownouts are avoidable, according to LogicMonitor’s global IT survey, then every business should be working proactively to prevent any disruptions. Yet, even the most highly skilled IT professionals seem to be unsure of how to tackle the task. Careful advance planning, a team that is well prepared and powerful monitoring software all go a long way in helping organisations minimise downtime.

IT monitoring: Don’t monitor yourself into a madhouse

John Jainschigg, content strategy lead at Opsview, argues in Information Age that if done right, IT monitoring provides clarity and promotes operational effectiveness. Done wrong, however, it can make your staff crazy and limit business growth. Read here

Here are some key steps every organisation can take:

Implement comprehensive IT monitoring. Many organisations run a hybrid IT environment that combines infrastructure both on-premises and in the cloud. Using separate monitoring tools for each platform is not only inefficient, but also prone to error. Instead, businesses should choose a software solution that covers their entire infrastructure landscape and lets the team monitor IT systems through a single pane of glass. To ensure the solution integrates with present as well as future technologies, selecting a platform that can scale and expand is key.

Use a monitoring solution that gives the team early visibility into trends that indicate there is trouble brewing. Data forecasting is a useful tool; it allows organisations to identify impending failures and proactively prevent issues before they impact the business. Early alerts enable teams to fix single points of failure that might cause a system to go down. An additional way of preventing downtime is to build a high level of redundancy into the monitoring platform.

Make sure you have a detailed response plan for IT outages. Define responsibilities as well as processes on who to involve and when. This emergency plan may never be needed, but it’s imperative to have clear procedures for managing an outage, from escalation and remediation to communication and root cause analysis.

Alongside data, availability has become the most valuable commodity of the digital age. No organisation is immune to IT failures – but those that take the right preventative measures can ultimately greatly reduce the impact of outages.

Written by Mark Banfield, chief revenue officer at LogicMonitor

Editor's Choice

Editor's Choice consists of the best articles written by third parties and selected by our editors. You can contact us at timothy.adler at stubbenedge.com More by Editor's Choice

Reducing the impact of IT outages

Air Malta’s digital transformation: turning being small to an advantage

Combating costly downtime

How to improve availability

IT monitoring: Don’t monitor yourself into a madhouse

Editor's Choice

Related Topics

Related Stories

Andrew McAfee – ‘Human beings are chronically overconfident’

Keys to effective cybersecurity threat monitoring

How businesses can vet their cybersecurity vendors

Five key signs of a bad MSP relationship – and what to do about them

Related Stories

Outsmart the skills gap crisis and build a team without recruitment

What the UK’s new AI Opportunities Action Plan means for tech jobs

Why CISOs need to pay attention to geopolitical trends

What does leadership in a hybrid world look like?