Oliver Goodman has served as head of engineering at global data centre service provider Telehouse since January 2020, and has seen first hand how AI has helped to solve operational challenges that occur within the sector. This Q&A will explore the biggest obstacles that the sector has needed to overcome, as well as how data centres have been leveraging AI to drive efficiency and maintain resilience.
What are the biggest challenges that data centre operations have faced within the last year, and how can AI help companies in the sector overcome them?
While every business has been impacted by the pandemic in one way or another, the data centre industry has remained buoyant as digital transformation has accelerated. We’ve seen a surge in customer demand and growth in every service we offer, so our and the industry’s biggest challenge right now is dealing with that growth while adapting to workplace restrictions.
Balancing the growing market with having to work with tighter resources means data centre operators are looking to make the shift to more intelligent decision making and further expand on automation that guarantees uptime and effective functionality, and this is where AI can help.
Many data centres rely heavily on manual operations to inform operational decisions but by applying AI, we hope to implement more automation for example in the transfer of load or intelligent switching between redundant and resilient equipment. Using data we collect on facility temperature, humidity and how “hard” infrastructure is working, AI could be applied to help us understand what could be done to extend the serviceable life of equipment and whether savings to be made in terms of energy efficiency, capital expenditure of upgrades and part replacements.
Streaming Netflix on 4K: How does Netflix work? The data centre is key
How has AI been driving energy efficiency and helping the data centre space to become more environmentally friendly?
Data centre operators have extensive data management systems involved in the collection, aggregation and visualisation of data that can help us analyse all kinds of factors, such as customer load, aisle temperatures and humidities in each data hall. AI takes that data and performs an action based on certain trigger points.
If the customer load goes beyond a certain level, cooling infrastructure can be ramped up or down to provide sufficient cooling in the most energy efficient way. This is preferable to keeping that plant running at 100% just in case the load goes up. Machine learning can also be used to predict these events dependant on a number of other factors (e.g. external ambient temperatures) so the control systems can react accordingly and automatically.
Most data centre control systems already use AI to an extent to control and improve energy efficiency effectively. For example, an uninterruptable power supply can automatically change from one efficiency mode to another depending on the load of the system. The AI/control systems will turn off redundant modules and put them into hibernation where appropriate, ensuring the system runs as close to the optimum efficiency for the actual load at any given time.
Every year these control systems get better, but there is a growing expectation on the manufacturers to come up with highly efficient systems that we can build a level of automation on top of to help us achieve maximum efficiency gains.
How can the monitoring of data centre network traffic leverage AI to maintain resilience against cyber threats?
Developments in network management AI and cyber security are allowing us to detect unusual activity outside of usual traffic patterns. In a typical office environment, if a company device logs in at 3am and starts taking gigabytes of data from the business, that will be flagged as atypical behaviour. AI can analyse this breach quickly and respond by disabling that device’s network access to stop the possible data loss.
That data transfer could also take place in the middle of the working day, but it might come from a device that would not normally transfer that volume of data, such as a laptop solely used for presentations. The AI already understands the typical behaviour patterns of that device and will flag when there might be inflow or outflow of data that does not fit its typical usage pattern.
In a data centre it is no different. Every server has its own typical operational pattern, and these can be monitored by the cyber security systems, and any unusual activity can be flagged. It is possible to take this further than simple network monitoring by interfacing with other systems. For example, detecting whether server behaviour changed after someone entered a secure server hall, which could indicate that a server has been tampered with. From a cyber security perspective, the possibilities of what you can achieve with AI are virtually limitless and this is a significant growth area.
Are data centres ripe for hacking?
In what ways can AI help data centre operators balance workloads to keep electricity costs as low as possible?
We’re constantly battling two eternal struggles: the cost of electricity per useful operation, and using energy efficiently. Data centre loads are increasing year-on-year, which means that our electricity bills go up year-on-year. The digitisation of the world means this won’t change anytime soon.
Data centre design is heavily geared around optimising for where we believe the IT load will settle in the long run. We need the load to be at a level that allows the infrastructure to operate as efficiently as possible for a given unit of useful output (rack kW in our customers’ case). If our customers are not using their contracted kW, we might run have to run equipment such as chillers at 10% of their capacity, meaning they will run very inefficiently, so while we’re keeping electricity costs low, it’s actually stopping us from maximising our energy efficiency. We use data collection and AI to identify where loads are mismatched to the installed infrastructure and to modulate the output of our critical plant.
This data is also then fed back into the future building designs. We have a responsibility to design our control systems in a way that allows us to improve energy efficiency throughout the lifetime of the building including in the early low load stages. We’re walking that efficiency curve depending on what the load is. AI can be used very effectively in control systems to help us balance cost and efficiency and this is improving over time.