Why did British Airways suffer such an extreme IT meltdown?

This bank holiday weekend, travellers in the UK remained grounded after a disastrous computer outage completely disrupted British Airway’s global operations at both Heathrow and Gatwick Airports. The system was only down for a few minutes, but the essential back up system failed to kick in.

The carnage has begun to subside, but the financial and reputational costs for the airline will be significant, with some experts predicting a compensation bill of around £120 million, and some predicting double that.

The downed IT system affected most areas of operation; from booking, baggage handling, mobile phone apps and check-in desks – with over 1,000 flights affected and eventually cancelled. No one is entirely certain of a possible cause for the system’s malfunction. There is no evidence of a cyber attack, but some have suggested that in an effort to save on costs, BA’s outsourcing of hundreds of IT jobs to India in 2016 may have been responsible.

However, BA chief executive Alex Cruz said the IT failure was not due to technical staff being outsourced from the UK to India. “There was a power surge and there was a back-up system, which did not work at that particular point in time. It was restored after a few hours in terms of some hardware changes… we will make sure that it doesn’t happen again,” Mr Cruz said in his first interview.

In the event of an IT outage, which is seemingly becoming increasingly common in the airline industry after Delta’s IT failure last year, the back up system simply must do its job. So why didn’t it?

CAST, the software analysis and measurement firm, believe that airlines must address fundamental code issues at a structural level to protect their IT systems against glitches. Too often such systems are patchwork quilts of code fragments. These are fine, if checked at a structural level, but it all adds to IT costs and in February last year, BA was reported to be firing 900 IT staff, to save costs.

Bill Curtis, SVP and chief scientist at CAST, said: “Airline computers juggle multiple systems that must interact to control gate, reservations, ticketing and frequent fliers. Each of those pieces may have been written separately by different companies. Even if an airline has backup systems, the software running those likely has the same coding flaw.”

“Tracking down a software flaw can be very difficult. It’s like investigating crime; there is a lot of data they’ve got to sift through to figure out what actually happened.”

From another perspective, David Drai – the CEO and founder of Anodot, a business intelligence and anomaly detection company – suggests this situation could have been curbed had the right business intelligence services been integrated into their systems.

This practice allows for correlations of all a company’s raw data to identify anomalous behaviour from ‘normal’ data, catching incidents before they become crises. Once an issue is detected technical teams are alerted to resolve issues before they unravel – exactly what British Airways needed 36 hours ago.

The UK’s largest conference for tech leadership, Tech Leaders Summit, returns on 14 September with 40+ top execs signed up to speak about the challenges and opportunities surrounding the most disruptive innovations facing the enterprise today. Secure your place at this prestigious summit by registering here

Nick Ismail

Nick Ismail is a former editor for Information Age (from 2018 to 2022) before moving on to become Global Head of Brand Journalism at HCLTech. He has a particular interest in smart technologies, AI and... More by Nick Ismail

Why did British Airways suffer such an extreme IT meltdown?

Nick Ismail

Related Topics

Related Stories

Tech leaders profile: protect your business from disaster

The importance of disaster recovery and backup in your cybersecurity strategy

Shifting emphasis towards cloud-first data protection

NotPetya five years on: the cyber security lessons learned by organisations

Related Stories

Tech leaders profile: protect your business from disaster

The importance of disaster recovery and backup in your cybersecurity strategy

Shifting emphasis towards cloud-first data protection

Four tips to increase executive buy-in to disaster recovery