Data centre power outages have been making the headlines recently with complications for both Global Switch and British Airways, causing chaos for partners and consumers alike.
In the short term, power losses like these can stop critical business functions which can lead to a drop in revenue. Moreover, in the longer term the reputational damage caused by such failures can hinder a company’s profits for months, if not years.
>See also: Amazon and Microsoft clouds struck by power outage
However, power outages like these can be avoided.
With a proactive approach to power chain integrity, organisations can minimise the chances of suffering a power outage so they can remain functional, and more importantly profitable.
Full transparency
It is absolutely crucial for organisations to document their power chain all the way from where the power enters, through to the UPSs, PDUs, and out to all pieces of rack-mounted equipment. Having this information means they can understand the potential impact of an outage should a certain piece of equipment fail, or is taken offline.
Additionally, they should also be aware of the maintenance status for each power chain device and when each is reaching the end of its lifecycle.
Real time monitoring
Organisations must have real time monitoring over what is going through their data centre’s power chain at any one time. This is so they can get a reading of what energy is being used by which device, and where.
Organisations must ask themselves, “Do we have the capability to look at all the information, all the infrastructure components in the facility and see the entire systems in one place via a single pane-of-glass view?”
>See also: Another day, another deadline making IT outage
If the answer is no, then they should look into gaining a holistic view that brings real-time monitoring and alarming that enables data centre operators the ability to mitigate risks, and make changes to avoid disaster.
Simulating for power failures
Having the ability to perform power failure simulations by switching devices off – without affecting the production environment – is critical, as it allows organisations to have a well thought-out action plan to recover services.
Time and time again data centre operators have assumed that their power chain and back-up systems are foolproof and have ignored a failsafe test, only to find their themselves making headlines for all the wrong reasons.
Simulations can also help locate where redundancy is lacking and uncover single points of failure.
Avoiding power overload
To ensure that a data centre is being supplied with the right amount of power and that it is not being overloaded, IT personnel and facility managers must work together and share information.
This ensures all wider team members can assess what hardware is installed, what is being added or taken away and how much power each component needs. It is this information that is going to stop a component overloading and collapsing the system.
>See also: Plugging the power management skills gap
Documenting this procedure is extremely beneficial as it will help to make sure all information is shared consistently. Then everyone can look back on what’s been done and improve on procedures, to avoid future disruptions.
Identifying long term trends
As critical as up-to-the-minute information is, it’s also vital for organisations to analyse data centre performance over a long period of time to identify trends and patterns that can be pinned for long-term forecasting. This allows organisations to plan for change and fluctuations, balance loads as well as predict future capacity needs, plan workflows, and schedule services.
Identifying vulnerabilities
Traditionally, security has never fallen into the remit of the facility manager – it has always been under the watchful eye of the IT department. However, now data centres have stronger and more plentiful connections with the outside world, it can leave them open to easier attacks.
To defend against this, organisations must make sure that security measures such as passwords are changed regularly and that outside contractors only have access to devices that they need, and certainly not anything that can shut systems down.
Organisations should also look into a proven power management solution that can be realised with a data centre infrastructure management (DCIM) solution.
A DCIM solution enables IT and facility personnel to run the data centre at peak efficiency, while allowing all stakeholders to improve overall operations, while identifying vulnerabilities to keep the power chain safe.
>See also: Top 6 data trends for the enterprise
For many organisations, a power outage is the worst-case scenario, and when you pick up the newspaper and read about another company suffering a ‘failure’ – it’s evident why. Reputation is what keeps customers coming back and if that is jilted, it will do as much harm, if not more, to future profits as the downtime itself.
However, by keeping a watchful eye over the power chain, implementing worse case scenarios and identifying vulnerabilities, organisations can focus on profit and help the tech does what it needs.
Sourced from Robert Neave, co-founder, chief technology officer and vice president of Product Management for Nlyte Software, the data centre infrastructure management (DCIM) solution provider