Solving the Heat and Power Problems Raised By Blade Servers

By Walter Schwarz, HVAC/R Manager, and Earl Zmijewski, Director of Computer Services, Fluent Incorporated, Lebanon, N.H. February 25, 2005

As blade servers become more powerful and businesses pack more and more of them into data centers, cooling the thin devices is becoming more of a challenge. A rack of blade servers can generate as much as 14 kilowatts, nearly the heat given off by two electric ovens. The simplest solution is to add air conditioning capacity, but this is expensive and can’t prevent failures on the inevitable occasions when the air conditioning temporarily fails.

And while adding air conditioning may be necessary, it also makes sense to evaluate the airflow conditions in the room in order to determine whether or not the existing capacity is being effectively utilized. For example, it is quite possible that cold air entering the room might be quickly exiting through the returns without cooling the servers, a process known as short-circuiting. Computer simulation can help optimize the effectiveness of the air-conditioning system by determining the airflow in the room while also evaluating the impact of potential changes in equipment location, heat dissipation, diffuser configuration, and ventilation system parameters. In short, numerical simulation can be used to develop an airflow configuration that is best suited to the individual needs of any existing or planned facility.

Early blade servers offered relatively low power consumption by using lower-power processors, but as blade servers have penetrated the enterprise market in recent years, they have become increasingly powerful. A standard server cabinet dissipates on the order of 3 kilowatts of power, while partitioned blade servers can dissipate up to 5 times as much. “Enterprises must… carefully factor in the power and cooling demands of these blades,” says a recent Gartner Group report. “Many organizations will not be able to achieve optimum density because of environmental limitations.” ASHRAE’s Thermal Guidelines for Data Processing Environments (2004) says that temperatures should be maintained between 68ºF and 77ºF in the server room. As temperatures rise, failure becomes a real possibility, which can lead to downtime and even data loss.

Effect of airflow on room temperatures

While it is clear that many companies will need to make expensive expansions to their air conditioning systems in order to accommodate increases in power density, simply increasing air conditioning capacity is not the most efficient way to address this problem. The effectiveness of an air conditioning system is dependent upon the degree of heat exchange taking place within the room. In a room designed for human habitation, the goal is generally to maintain the entire room at as close to a constant temperature as possible. To achieve this goal, the hottest air is usually the air that has been in the room the longest prior to being removed through the return. Air that is recirculated within the room without being removed will only get hotter, while air that does not properly circulate will be removed at too low a temperature. In a data center, which is designed for computers rather than humans, the goal is instead to maintain the critical equipment within acceptable temperature ranges, both while the air conditioning is operating and also during a failure for a length of time that is sufficient to restore power or make repairs. The ideal cooling arrangement usually comes from minimizing the temperature of the air at the ventilation inlet of the equipment and then removing the hot air after it has left the equipment outlet.

Computational fluid dynamics (CFD) is increasingly being used to determine the flow of air through data centers. A CFD simulation provides fluid velocity, pressure, temperature, and other variables, as appropriate, throughout the solution domain for problems with complex geometries and boundary conditions. As part of the analysis, the user may change the geometry of the system or the boundary conditions, and observe the effect of the changes on fluid flow patterns or distributions of other variables. The process of simulating airflow in a data center has been greatly simplified by the development of Airpak, CFD software from Fluent Inc., Lebanon, NH, designed specifically for modeling internal building flows. Mouse-driven selection and placement and sizing of predefined objects, such as rooms, people, fans, partitions, vents, openings, sources, resistances, and ducts makes model building fast. Fully automatic unstructured mesh generation makes it possible to model complex geometries with ease.

Modeling a typical data center

The data center at Fluent provides a typical example of how CFD can help avoid overheating failures while minimizing the need for expensive air conditioning equipment upgrades. In this facility, not much attention had been paid to thermal cooling of the data center room, since most computing resources were deployed in individual offices. Difficulties were experienced in maintaining optimal cooling conditions in the data center but not much attention was paid to the problem since all of the equipment worked as intended. Then one summer day, an air conditioning unit failed and before it could be repaired one corner of the room got too hot. A disk drive in a RAID array failed and while it was being recovered to a hot spare, a second drive failed. The result was the temporary loss of 500 GB of data. The data was recovered from a backup tape, but the backup took a considerable amount of time. For this reason, and because plans existed to substantially increase the amount of power-dissipating equipment in the room, the director of computer services asked for a simulation to be performed to avoid future situations of this type.

Fluent engineers working with clients in the HVAC industry used Airpak to quickly create a virtual model of the data center room, which is about 1000 square feet and has a power density of about 37 watts per square foot. The actual power of the equipment was directly measured and used in the simulation. Direct measurements provide greater accuracy than the equipment nameplate ratings, which are often conservative, and can lead to overdesign of the air conditioning system. The first analysis was of the existing system which included three heat pumps rated at 53,500 btus per hour and three 4-way ceiling diffusers which each deliver 2000 cubic feet per hour of air. The diffusers discharged air parallel to the ceiling in four directions, a configuration that is designed specifically for rooms occupied by people. The heat generating sources in the room included a number of racks carrying 90 1U (1.75”) and 36 2U (3.5”) blade servers running Windows and Linux, 14 Unix servers, 25 disk arrays, and a number of odds and ends.

Simulation identifies airflow problems

The analysis results showed that cold air was flowing from some of the diffusers straight into the returns without providing any benefit of cooling the equipment. It also showed wide variability in the temperature of the air moving into the inlets of the servers, with temperatures in some cases being above ASHRAE standards even with all air conditioning systems operating as intended. Additional simulations were run to model the situation where one or more of the heat pumps failed and they showed, as expected, temperatures high enough to rapidly cause equipment to fail. The managers were impressed with the close agreement between equipment temperature readouts they could see in the room and the temperatures predicted by the CFD simulation.

Additional simulations were performed in an effort to improve the existing conditions. The four-way ceiling diffusers were replaced in the simulation with diffusers that discharged cool supply air vertically down towards the floor. Both the equipment and diffusers were rearranged to provide a hot aisle/cold aisle arrangement with the inlets of all the equipment oriented towards the cold aisles that were directly fed by the diffusers. The hot aisles, on the other hand, were positioned under the returns, which were left in their existing positions because they are more expensive to move. The simulation of the new design showed much lower temperatures at the equipment ventilation inlets, while higher temperatures were seen near the equipment air outlets where they cause no harm. Other analyses were performed to assess the vulnerability of equipment in the room to hypothetical failure scenarios, such as the loss of refrigerator coolant, and to evaluate the impact of planned increases in the computer equipment contained in the room. These simulations showed that the ventilation changes were able to dramatically improve the effectiveness of the existing air conditioning equipment to the point that with the new arrangement of equipment and vents, the previous failures would have been prevented. The simulations also showed that the planned additions in computer equipment would require an increase in air conditioning capacity, but that the new arrangement required much less additional capacity than the old one.

For more information on CFD analysis software go to .