There is a crisis in the data centre. But not the kind of crisis IT managers are used to addressing, such as a chronic shortage of disk space or a tortuous systems upgrade.
The issue here is pure physics: the relationship between heat, air and electricity. For data centre managers, the challenge is simply stated but hard to meet – to get these in the right place at the right time.
As organisations have consolidated their earlier investments in distributed systems or simply expanded their already-centalised processor power into row upon row of rack-mounted systems, they have encountered some major problems.
Without much familiarity with the notion that data centres can actually run out of core physical capacity (other than floor space), organisations are over-filling their racks and laying out their data centre without due consideration. As a result, systems are running hot – so hot that they are ‘melting'. Not literally, of course – but that is just how the rack vendors like to dramatise the disruption of a temperature-triggered shut down.
The cooling equipment – CRAC (computer room air conditioning) in data centre-speak – is increasingly unable to extract heat fast enough to prevent systems failure.
"This is the number one issue, the core topic of conversation in every data centre right now," says Colin Hopkins head of services at BT's Telehousing customer data centres. "All the technical questions we get are about power and cooling."
Aaron Davis is seeing the same issue. As a vice president at American Power Conversion (APC), he is witnessing the impact server compaction is having at customer sites.
"Server consolidation may look like a panacea for a lot of problems of distributed systems, but it breeds a cooling problem and a power consumption problem that a lot of IT guys simply don't know about," says Davis.
For one, very few internal IT managers have looked hard at the economics of data centres. "People think they can get more and more processing power for much the same price, simply by increasing the amount they pack in a rack," says Willie McBearty, power and infrastructure manager at BT. "Because of space issues, they have been hooked on the idea of footprint as the unit of cost."
The new cost unit is the electricity needed to power and to cool, a factor that was rarely seen as a problem in the past because data centres were designed to cope with the capacity requirements of large, standalone servers and mainframes.
Today, in co-location data centres, for example, a high-end customer's rack equipment might draw 2.5 kilowatts (kW) of power per square metre; as a result it would consume around 1kW just to cool the equipment. If the user then decides to double the density of the equipment in the rack, increasing its power consumption to 4kW, the extra 2kW in cooling alone would cost over £2,000 a year. And that is just one rack – big organisations may have several hundred.
"There is a false impression that you can save money by sticking more and more servers into the same rack. But it doesn't pay to keep on doubling up power loads," says BT's McBearty.
Blow for blow
The problem stems from the fact that many data centres have never been designed to operate at that level of processing density. "Some data centre people are acutely aware of the issues, but others build out racks without understanding the implications, stacking them one blade above the other, with no air flow," says Richard Borton, group operations director of data centre hosting services company, Global Switch.
Worse, at a rack level, there is often a temptation to run servers back-to-back – but that results in one rack simply blowing hot air into the other.
At a unit level, unless engineers understand how different equipment deals with cooling, fans extracting heat from a server may be pushing hot air directly into the fan of, say, a neighbouring router or storage devices with drastic consequences.
The problem is being multiplied by other factors. "Increasing [processor] power density is leading to more cooling problems in the enclosure; compaction is causing more heat in the same space; smaller equipment often has the same power consumption as its larger predecessor; the presence of ever more cables is blocking airflow; and more equipment is being squeezed into fewer enclosures. That means that traditional air distribution methods are rarely adequate," says APC's Davis.
Others are observing the same effects. "In many corporate data centres, especially financial centres, space is all important. People are squeezing servers into tighter and tighter spaces and asking themselves how can we get them into a rack," says Paul Smith, UK country manager at remote management software vendor Avocent.
Server manufacturers have told customers they can pack dozens of units into a rack, but users are finding that they are having to leave rack spaces empty just to stop the whole enclosure burning out. And that is even before many organisations have embraced the emerging wave of even denser server technology.
The arrival of blade servers has prompted many to pack their racks even further. And although the devices share power supplies and cooling systems with other units, the sheer processing power on the board generates 50% or more heat than standalone rack units.
"While before people would expect to put four to five units in a rack, you can now put 15 to 16 into the same chassis, which could give you 10kW to 15kW," says BT's McBearty. "The reality is that would melt the rack." BT has 23 data centres in the UK with five dedicated to customer space; it designs them to provide 1kW per square metre of power.
"You can pack in more blades but the [related] costs rise disproportionately. As does the risk of failure as the temperature rises. When it comes to compacting technologies such as blades into a rack, the only saving you are making is on the footprint. The real cost is air and power," says McBearty.
APC's Aaron Davis sums it up: "Yesterday's physical infrastructure strategy plus tomorrow's server strategy equals disaster."
Blades draw two to five times the per-rack power of existing technology, he says, and generate correspondingly increased heat output. "With blades, the constraint on growth becomes your ability to power and to cool," he says.
Heat sink
The solution to these issues is not obvious. In many data centres, managers are simply going back to the drawing board, redesigning the layout of their ‘real estate' to ensure that they have enough cooling. It may not sound like rocket science, but most have realised they need to arrange their systems so that heat is pumped into designated hot aisles and air drawn from cool aisles.
But that may be insufficient as vendors are now designing server systems that will demand massively more concentrated power: 20kW or more if installed in a single rack.
Reacting to the evolving situation, APC has throw its data center expertise into InfraStruXure, a power and cooling encasement that can create a safe running environment for 10 to more than 100 racks.
The company says that such a modular ‘network-critical physical infrastructure' architecture can save a minimum of 35% of data centre running costs.
As in almost all such cases, there are related issues of space. One early InfraStruXure user, Deloitte in the Netherlands, runs 168 racks in its data centre. Historically, because of heat issues, it was only able to fit three CPUs in a rack; after implementing the purpose-built APC environment, that soared to 72 CPUs per rack, says CIO Erik Ubels. "That saved a huge amount of room," he says.
Colin Hopkins at BT is seeing the same set of dynamics. "The data centre industry is moving away from a real estate model for processing to a power and cooling model."
Other rack specialists seem to have gone full circle to beat the heat. German's Knürr AG recently launched a water-cooled server rack, the modern relative of the systems that once cooled mainframes. Its platform provides heat exchangers that can soak up 20 kW in a closed, deep freeze-like cabinet.
Unplugged
But such devices makes the assumption that the data centre is able to get enough electricity to where it is needed. As racks have been sucking up more and more kilowatts, data centres, both in-house and outsourced, are reaching a ceiling on the power they can supply to an individual rack – another reason why some have to leave some racks half filled.
"Organisations are reaching the capacity of the power they can get to their racks," says BT's Hopkins. "People are asking if there is enough power to go round."
That underscores the fact that the data centre physical infrastructure has become a critical business issue – and one that might not be solvable without resorting to seemingly crude, but effective methods: water-cooling or refrigeration.
As Davis at APC says: "We may get to a situation when density is such that air is not up to the job."