At the onset of summer 2003, IT directors were sleeping soundly in their beds, no longer haunted by the spectre of disaster recovery. After 9/11, the majority of large companies had revised their fail-safe back-up systems and processes. And it seemed that the IT world was more prepared than ever to keep organisations humming – under any circumstances.
Try telling that to the thousands of British Airways passengers whose flights were cancelled or delayed one chaotic morning in September, after a power failure at Heathrow shut down the computer systems responsible for check-in and baggage services. Or patients at an Ottawa cancer centre in Canada, where critical servers were waterlogged after the air conditioning system switched off and began leaking during a recent blackout. Or office workers in Sydney, Australia, shut out of their buildings for a day after a power failure deactivated computerised locking systems.
Add to these recent cases the effect of the SoBig and MSBlast computer viruses – thought to have infected up to 30,000 systems an hour during its July peak – not to mention the almost unprecedented blackout in North America the following month, itself possibly caused by a virus-damaged server, and the picture is one of failure in terms of the security, resilience and redundancy of the world’s IT infrastructure.
|
Wake up call
The huge blackouts in London and North America stripped away any lasting veneer of impregnability to reveal IT systems more vulnerable than even the most pessimistic had suspected.
Companies affected by blackouts are usually reluctant to disclose their experiences, in an attempt to protect their reputation with customers and partners. But some common threads have emerged from the recent crises.
An astonishing three-in-four US companies in the affected areas surveyed by Info-Tech Research said they were disrupted by the North American blackout in some form, either directly or through one or more of their suppliers’ systems going dead. Of those, the vast majority admitted they were ill prepared for a crisis on that scale. “This blackout demonstrated that most IT departments, especially those in mid-sized companies, are still flying by the seat of their pants,” says Jason Livingstone, an Info-Tech analyst. “Disaster recovery is simply not on their list of priorities.”
That view might seem unfair, but, alas, it is not entirely. There is mounting evidence that some organisations are failing to plan for business continuity events at all; and of those with plans in place, many are failing to properly test or enforce them.
A recent survey of UK businesses – carried out by Infoconomy, the publisher of Information Age, in association with American Power Conversion (APC), a vendor of batteries, back-up systems and generators – found that 65% had suffered business disruption due to a power outage. And yet 62% still did not have an overall strategy in place for addressing such events.
“We believe that at least 90% of UK companies have no form of contingency planning in place,” says Jim Simmons, CEO of SunGard Availability Services, a business continuity specialist. “Only 8% of organisations without business continuity plans can expect to survive a ‘disaster’. For those companies, the power outages [in London and North America], occurring just weeks before the second anniversary of 9/11, must have been a huge wake-up call.”
Another problem is that companies are often failing to carry out regular simulations of business continuity events. About one-in-four IT directors in the UK either do not know when their business continuity management plan was last tested or think it was probably more than a year ago, according to a recent survey by storage vendor Hitachi Data Systems (HDS).
But even having a detailed and well-rehearsed plan in place does not guarantee protection. Human error still has to be factored in. Moreover, business continuity projects are often funded off-budget – raising the possibility that failures in IT management practices will creep into the system. The HDS study found that IT directors placed human error behind fire as the most likely cause of business continuity events. (Interestingly, this does not tally with the most common causes of disaster recovery invocations – see table, ‘Causes of UK disaster recovery invocations’.)
Of course, IT processes are far from being error-free. Internet service provider Lycos’s email services recently went down for four days, after mistakes were made during a routine job to load new backup software on to a web server.
Neil Rasmussen, APC’s senior vice president and chief technical officer, says the kinds of mistakes that IT experts can make are wide and varied. “Some protect their servers [with uninterruptible power supply] but forget about the hubs. Some overlook ‘back doors’. Many do not have sufficient ‘runtime’ [back-up power capacity]. Others don’t install the management software,” he says.
Enterprises should also guard against complacency, says John Sharp, chief executive of the Business Continuity Institute (BCI), which seeks to raise awareness of business continuity matters and has developed a code of practice for organisations on how to prevent and cope with business disasters. “There is often a sense that ‘this will never happen to us’. One common view is that, if I am sitting in an office in East Grinstead, then I’m not going to be affected by a terrorist bomb in London. Well, you’d better think again,” he says.
Lessons not being learned
When the lights went out in North America on 14 August 2003, plunging more than 50 million people into darkness, the first thing that many did was try to call family and friends on their mobile phones. When that didn’t work, the next step was to try to email. But like large sections of the wireless network, many Internet service providers were also down. It did not take long for people to realise that network operators, ravaged by the telecoms downturn, had cut investment in back-up power and diesel generation facilities. “Bad engineering? No, greed and bad financial decisions,” wrote former BT chief technologist Peter Cochrane in one particularly acerbic article.
|
|||||||||||||||||||||
The cost of providing adequate business continuity infrastructure is often compared to insurance and other risk-management steps. Such thinking can have dangerous consequences, says the BCI’s Sharp, since costs can always be cut during difficult times. Preventing problems is a good idea since so many recoveries fail – some studies put the failure rate up as high as 50%. Avoiding disasters may also keep a business afloat. Surveys show that small businesses often go under while waiting for an insurance payout. Other businesses suffer fatal damage to their reputation. And it is not always possible to keep events out of the public eye. In one case in Birmingham in the early 1990s, the firebombing of a law firm was reported on the local news, prompting clients to flee in droves. The firm survived – just.
There may be a certain amount of 20:20 hindsight involved here, and admittedly, not all disasters can be easily anticipated. A case in point surrounded the initial outbreak of the SARS virus in Southeast Asia. The virus caused widespread disruption to businesses. Some IT workers in Hong Kong were quarantined in their homes. In Singapore, several banks were so worried about having their premises quarantined that they set up impromptu back-up IT departments within Hewlett-Packard’s local business continuity centre.
Now, with the SARS virus seemingly contained, Singapore’s government has moved swiftly to avert a repeat of the business disruption. From 2004, it will become the first country in the world to certify companies as complying with business continuity standards, based on a code drawn up by the BCI. But regulators elsewhere have not yet grasped the nettle. Although the BCI code has been translated into many languages, it seems unlikely that the UK and other countries will be adopting similar standards in the short term, says Sharp. At least new regulations governing records-retention and risk-management procedures, such as Sarbanes-Oxley and Basel II, may ultimately plug this gap.
|
Best practice
As the events of the past two years have demonstrated, it is virtually impossible for a business to plan for each and every eventuality. But at least trends in technology can help. The movement in the IT industry towards redundant networks and computer systems, storage area networks and long-distance data replication should ease many of the difficulties, as should improvements in IT security, both at an application and a network level, say experts.
Brian Fowler, HP’s global director of business continuity services, is bullish about the future. “There was an upsurge in sales even before the blackouts,” he says. “The horrific events of September 11 really did show people that they need to make every effort to protect their systems.” Business continuity is now one of the leading priorities for IT directors, he says, which explains why HP has made it one of the key strands of its ‘adaptive enterprise’ strategy, underpinned by technologies such as server clustering designed to remove single points of failure.
|
But even customers of companies such as HP, SunGard and APC can ill afford to sleep easy. Blackouts are usually more common in the winter, when there are greater demands on the electricity grid. Energy experts are already predicting a fresh spate of outages in the winter months of 2004, caused, it is argued, by under-investment in infrastructure since electricity markets were deregulated.
Ironically, perhaps, developments in IT and other areas of advanced electronics may be adding to the problem. Some estimates have shown that 70% of power consumption today is industrial grade, with the rest coming from sensitive electronics, such as PCs and televisions. In the next 10 years, some energy experts believe this position could be inverted – suggesting that demand for power-hungry computing devices will put even greater pressures on an already overloaded power grid.
|
|||||||||||
The BCI says that all organisations will be affected by a business continuity event of some description one day. Some will rise to the challenge; many will fail. When the crisis threatens, the worst thing an affected company can do is stick its head in the sand, says Sharp. “It is imperative to always move as quickly as the crisis. Never let it get ahead of you,” he says. Instead, organisations should quickly draw up an action plan and let customers and suppliers know that there is a potential problem, and that it is being dealt with. “Clients will always be sympathetic if they feel that something is being done about their [sales] order. Above all, they don’t want your crisis to become their crisis.”
|
|
||||||||||
|
||||||||||