You are on page 1of 5

Data Centers

Comparing Data
Center & Computer
Thermal Design
By Michael K. Patterson, Ph.D., P.E., Member ASHRAE; Robin Steinbrecher; and Steve Montgomery, Ph.D.

T
he design of cooling systems and design power (TDP) and temperature are worth obtaining, making the thermal
thermal solutions for todays data specications of each component (typi- challenge greater.
centers and computers are handled cally junction temperature, TJ , or case One of the rst parameters discussed by
by skilled mechanical engineers using temperature TC). Using a processor as the data center designer is the temperature
advanced tools and methods. The engi- an example, Figure 1 shows a typical rise for the servers, but this value is a
neers work in two different areas: those component assembly. secondary consideration, at best, in the
who are responsible for designing cooling The processor is specied with a maxi- server design. As seen by Equation 1, no
for computers and servers and those who mum case temperature, TC , which is used consideration is given to chassis tempera-
design data center cooling. Unfortunately, for design purposes. In this example, the ture rise. The thermal design is driven by
a lack of understanding exists about each design parameters are TDP = 103 W and maintaining component temperatures
others methods and design goals. This can TC = 72C. Given an ambient temperature within specications. The primary param-
lead to non-optimal designs and problems specication (TA) = 35C, the required eters being Tc, Tambient, and CA, actual.
in creating a successful, reliable, energy- thermal resistance of this example would The actual thermal resistance of the solu-
efcient data processing environment. need to be equal to or lower than: tion is driven by component selection, ma-
This article works to bridge this gap and terial, conguration, and airow volumes.
CA, required= (TC TA)/TDP = 0.36 C/W Usually, the only time that chassis TRISE
provide insight into the parameters each
(1)
engineer works with and the optimizations
About the Authors
they go through. A basic understanding of Sometimes this value of CA is not
Michael K. Patterson, Ph.D., P.E., is thermal
each role will help their counterpart in their feasible. One option to relieve the research engineer, platform initiatives and
designs, be it a data center, or a server. demands of a thermal solution with a pathnding, at Intels Digital Enterprise Group
lower thermal resistance is a higher TC. in Hillsboro, Ore. Robin Steinbrecher is staff
Server Design Focus Unfortunately, the trend for TC continues thermal architect with Intels Server Products
Group in DuPont, Wash. Steve Montgomery,
Thermal architects are given a range to decline. Reductions in TC result in Ph.D., is senior thermal architect at Intels Power
of information to begin designing the higher performance, better reliability, and Thermal Technologies Lab, Digital Enterprise
thermal solution. They know the thermal and less power used. Those advantages Group, DuPont, Wash.

38 ASHRAE Journal ashrae.org April 2005


The engineers work in two different areas: those who
are responsible for designing cooling for computers
and servers and those who design data center cool-
ing. Unfortunately, a lack of understanding exists
about each others methods and design goals.
is calculated is to ensure that exhaust tem- Reliability: Operational continuity is server within specications.
peratures stay within safety guidelines. vital to the success of the data center, so Required air-mover speeds are deter-
In addition to TDP and TC , the engineer server reliability receives signicant focus. mined through calculations performed by a
has several other targets, including: For the thermal solution, the items most baseboard management controller (BMC).
Cost: Servers are sold into very com- likely to fail are air movers. These are typi- The SM then acts to change the air-mover
petitive markets and cost is a critical cally redundant to provide for this increased speeds to ensure that the components stay
consideration. Todays budget for thermal reliability. Redundancy results in oversizing within specication. Consequently, the SM
solutions in servers is typically in the of air-mover capability for normal opera- normally is driving a server to be as quiet as
range of $50 to $75, depending on the tion leading to further inefciencies. possible while maximizing performance by
number of processors and features. It is Acoustics: The volume of air required keeping component temperatures within,
desirable to minimize this cost. to cool todays servers often creates a noise but not over, their limits. In some instances,
Weight: Current aluminum and copper problem such that hearing protection may SM enables a customer to choose perfor-
heat sinks continue to expand in size and be required. The area of acoustics is im- mance over acoustics. In these cases, air
surface area to augment heat transfer. The portant enough to describe further. movers are driven to levels to achieve the
increased weight of the heat highest thermal performance
sinks is a serious issue as the prioritized over acoustics.
Heatsink Tambient
processor package and mother- Acoustics specications for
board must be made sufciently computing equipment are speci-
Thermal
robust to handle the resulting Interface ed at ambient temperatures,
mechanical load. Material
ca typically 23C 2C (73C
Volumetric: The space inside 4C). Above this range, it is de-
Tsink
a server is extremely valuable, sirable, but not required, to have
especially as more comput- a quiet system. As a result, some
T
ing power and capabilities are case
systems attempt to maintain the
added. Using this space for heat Socket quietest possible operation as a
sinks and fans is not adding Processor Package competitive advantage. Others
value for the customer. sacrifice acoustics to reduce
Power: The total power re- Figure 1: Thermal resistance of typical server thermal solution. cost through the elimination of
quired for servers is increasing elaborate SM systems.
and driving changes to the data center Server Thermal Acoustic The data center designer must under-
infrastructure. The server fans can use Management stand, as a result of these SM schemes,
up to 10% of the server power.Reducing As mentioned previously, the thermal required airow through a system is greatly
all power is a design goal. engineer designing the cooling and reduced when room temperatures, or more
Many components to cool: Ideally, control system must counterbalance the specically server inlet air temperatures,
sizing air-movers to cool the highest need to cool all components in a system are held below 25C (77F). The tempera-
power component would be sufcient with the necessity of meeting acoustics ture rise through a system may be relatively
to cool the remainder of the system. requirements. To achieve this, the server high as a result of that lower airow.
Unfortunately, this is rarely the case and management (SM) monitors combina- Typical systems are designed to deliver
additional fans, heat sinks, and ducting tions of temperature sensors and com- about 60% to 70% of their maximum ow
in the server often are required. ponent use to take action to maintain the in this lower inlet temperature environ-

April 2005 ASHRAE Journal 39


ment. Monitoring of temperature sensors is accomplished via 1. A single rack in a room, and
on-die thermal diodes or discrete thermal sensors mounted on the 2. A fully populated room, with racks side by side in mul-
printed circuit boards (PCBs). Component utilization monitoring tiple rows.
is accomplished through activity measurement (e.g., memory Case 2 assumes a hot-aisle/cold-aisle rack conguration,
throughput measurement by the chipset) or power measurement where the cold aisle is the server airow inlet side containing the
of individual voltage regulators. Either of these methods results perforated tiles. The hot aisle is the back-to-back server outlets,
in calculation of component or subsystem power. discharging the warm air into the room. The hot aisle/cold aisle
is the most prevalent conguration as the arrangement prevents
Data Center Design Focus mixing of inlet cooling and warm return air. The most common
The data center designer faces a similar list of criteria for airow conguration of individual servers is front-to-back,
the design of the center, starting with a set of requirements that working directly with the hot-aisle/cold-aisle concept, but it is
drive the design. These include: not the only conguration.
Cost: The owner will have a set budget and the designer Consider the rack of servers in a data processing environment.
must create a system within the cost limits. Capital dollars are Typically, these racks are 42U high, where 1U = 44.5 mm (1.75 in.)
the primary metric. However, good designs also consider the A U is a commonly used unit to dene the height of electronics
operational cost of running the system needed to cool the data gear that can be rack mounted. The subject rack could hold 42 1U
center. Combined, these comprise the total cost of ownership servers, or 10 4U servers, or other combinations of equipment,
(TCO) for the cooling systems. including power supplies, network hardware, and/or storage equip-
Equipment list: The most detailed information would include ment. To consider the two limits, rst take the described rack and
a list of equipment in the space and how it will be racked together. place it by itself in a reasonably sized space with some cooling
This allows for a determination of total cooling load in the space, in place. The other limit occurs when this rack of equipment is
and the airow volume and distribution in the space. placed in a data center where the rack is one of many similar racks
Caution must be taken if the equipment list is used to develop in an aisle. The data center would have multiple aisles, generally
the cooling load by summing up the total connected load. This congured front-to-front and back-to-back.
leads to over-design. The connected load or maximum rating of
the power supply is always greater than the maximum heat dis- Common Misconceptions
sipation possible by the sum of the components. Obtaining the A review of misconceptions illustrates the problems and chal-
thermal load generated by the equipment from the supplier is the lenges facing designers of data centers. During a recent design
only accurate way of determining the cooling requirements. review of a data center cooling system, one of the engineers
Unfortunately, the equipment list is not always available, and the claimed that the servers were designed for a 20C (36F) TRISE,
designer will be given only a cooling load per unit area and will inlet to outlet air temperature. This is not the case. It is possible
need to design the systems based upon this information. Sizing that there are servers that, when driven at a given airow and
the cooling plant is straightforward when the total load is known, dissipating their nominal amount of power, may generate a 20C
but the design of the air-handling system is not as simple. (36F) T, but none were ever designed with that in mind.
Performance: The owner will dene the ultimate perfor- Recall the parameters that were discussed in the section on server
mance of the space, generally given in terms of ambient tem- design. Reducing CA can be accomplished by increasing airow.
perature and relative humidity. Beaty and Davidson2 discusses However, this also has a negative effect. More powerful air mov-
typical values of the space conditions and how these relate to ers increase cost, use more space, are louder, and consume more
classes of data centers. Performance also includes values for energy. Increasing airow beyond the minimum required is not a
airow distribution, total cooling, and percent outdoor air. desirable tactic. In fact, reducing the airow as much as possible
Reliability: The cooling systems reliability level is dened would be of benet in the overall server design. However, nowhere
and factored into equipment selection and layout of distribu- in that optimization problem is T across the server considered.
tion systems. The reliability of the data center cooling system Assuming a simple TRISE leads to another set of problems. This
requires an economic evaluation comparing the cost of the implies a xed airow rate. As discussed earlier, most servers mon-
reliability vs. the cost of the potential interruptions to center itor temperature at different locations in the system and modulate
operations. The servers protect themselves in the event of cool- airow to keep the components within desired temperature limits.
ing failure. The reliability of the cooling system should not be For example, a server in a well designed data center, particularly if
justied based upon equipment protection. located low in the rack, will likely see a TA of 20C (68F) or less.
However, the thermal solution in the server is normally designed to
Data Center Background handle a TA of 35C (95F). If the inlet temperature is at the lower
Experience in data center layout and conguration is helpful to value, the case temperature will be lower. Then, much less airow
the understanding of the design issues. Consider two cases at the is required, and if variable ow capability is built into the server,
limits of data center arrangement and cooling conguration: it will run quieter and consume less power. The server airow

40 ASHRAE Journal ashrae.org April 2005


(and hence TRISE ) will vary between the TA = 20C (68F) and if the airow is not adequate, the server airow will recirculate,
35C (95F) cases, a variation described in ASHRAEs Thermal causing problems for servers being fed the warmer air.
Guideline for Data Processing Environments. The publication If the design basis of the data center includes the airow
provides a detailed discussion of what data should be reported rates of the servers, certain design decisions are needed. First,
by the server manufacturer and in which conguration. the design must provide enough total cooling capacity for the
Another misconception is that the airow in the server exhaust peak, matching the central plant to the load.
must be maintained below the server ambient environmental Another question is at what temperature to deliver the sup-
specication. The outlet temperature of the server does not need ply air. Lowering this temperature can reduce the required fan
to be below the allowed value for the size in the room cooling unit but also
environment (typically 35C [95F]). can be problematic, as the system,
particularly in a high density data
Design Decisions center, must provide the minimum
To understand the problems that (or nominal) airow to all of the work
can arise if the server design process cells. A variant of this strategy is that
is not fully understood, revisit the two of increasing the T. Doing this al-
cases introduced earlier. Consider the lows a lower airow rate to give the
fully loaded rack in a space with no same total cooling capability. This
other equipment. If sufcient cooling will yield lower capital costs but if
is available in the room, the server the airow rate is too low, increasing
thermal requirements likely will be the T will cause recirculation. Also,
satisfied. The servers will pull the Figure 2: The work cell is shown in orange. if the temperature is too low, comfort
required amount of air to cool them, and ergonomic issues could arise.
primarily from the raised oor distribution, but if needed, from If the supplier has provided the right data, another decision
the sides and above the server as well. It is reasonable to assume must be made. Should the system provide enough for the peak
the room is well mixed by the server and room distribution airow. airow, or just the typical? The peak airow rate will occur when
There likely will be some variation of inlet temperature from the TA = 35C (95F) and the typical when TA = 20 ~ 25C (68F ~
bottom of the rack to the top but if sufcient space exists around 77F). Sizing the air-distribution equipment at the peak ow will
the servers it is most likely not a concern. In this situation, not result in a robust design with exibility, but at a high cost. Another
having the detailed server thermal report, as described in Refer- complication in sizing for the peak ow, particularly in dense data
ence 3, may not be problematic. centers, is that it may prove difcult to move this airow through
At the other limit, a rack is placed in a space that is fully popu- the raised oor tiles, causing an imbalance or increased leakage
lated with other server racks in a row. Another row sits across the elsewhere. Care must be taken to ensure the raised oor is of suf-
cold aisle facing this row as well as another sitting back-to-back cient height and an appropriate design for the higher airows.
on the hot-aisle side. The space covered by the single rack unit and If the nominal airow rate is used as the design point, the
its associated cold-aisle and hot-aisle oor space often is called design, installation, and operation (including oor tile selection
a work cell and generally covers a 1.5 m2 (16 ft2) area. The 0.6 m for balancing the distribution) must be correct for the proper
0.6 m (2 ft 2 ft) perforated tile in the front, the area covered operation of the data center, but a cost savings potential exists.
by the rack (~0.6 m 1.3 m [~ 2 ft 4.25 ft]) and the remaining It is essential to perform some level of modeling to determine
uncovered solid oor tile in the hot-aisle side. the right airow. In this design, any time the servers ramp up
Consider the airow in and around the work cell. Each work to their peak airow rate, the racks will be recirculating warm
cell needs to be able to exist as a stand-alone thermal zone. air from the hot aisle to feed some server inlets.
The airow provided to the zone comes from the perforated This occurs because the work cell has to satisfy its own airow
tile, travels through the servers, and exhausts out the top-back needs (because its neighbors are also short of airow) and, if
of the work cell where the hot aisle returns the warm air to the servers need more air, they will receive it by recirculat-
the inlet of the room air handlers. The work cell cannot bring ing. Another way to visualize this is to consider the walls of
air into the front of the servers from the side as this would be symmetry around each work cell and recall that there is no
removing air from another work cell and shorting that zone. No ux across a symmetry boundary. The servers are designed to
air should come in from the top either as that will bring air at a operate successfully at 35C (95F) inlet air temperatures so if
temperature well above the desired ambient and possibly above the prevalence of this recirculation is not too great, the design
the specication value for TA (typically 35C [95F]). Based should be successful.
on this concept of the work cell it is clear that designers must If the detailed equipment list is unknown when the data center
know the airow through the servers or else they will not be is being designed, the airow may be chosen based on historical
able to adequately size the ow rate per oor tile. Conversely, airows for similarly loaded racks in data centers of the same

41 ASHRAE Journal ashrae.org April 2005


load and use patterns. It is important to ensure the owner is Effecting Change
aware of the airow assumptions made and any limits that the The use of Thermal Guidelines has not been adopted yet
assumptions would place on equipment selection, particularly by all server manufacturers. The level of thermal information
in light of the trend towards higher power density equipment. provided from the same manufacturer can even vary from
The airow balancing and verication would then fall to a com- product to product. During a recent specication review of
missioning agent or the actual space owner. In either case, the several different servers, one company provided extensive
airow assumptions need to be made clear during the computer airow information, both nominal and peak, for their 1U
equipment installation and oor tile set up. server but gave no information on airow for their 4U server
Discussions with a leading facility engineering company in in the same product line.
Europe provide an insight to an alternate design methodology If data center operators and designers could convince their
when the equipment list is not available. A German engineering information technology sourcing managers to only buy servers
society standard on data center design requires a xed value of that follow Thermal Guidelines (providing the needed infor-
28C at 1.8 m (82F at 6 ft) above the raised oor. This includes mation) the situation would rectify itself quickly. Obviously,
the hot aisle and ensures that that is not likely to happen,
if sufcient airow is provided Full Data Center nor should it. On the other
to the room, all servers will hand, those who own the
be maintained below the up- problem of making the data
per temperature limits even if center cooling work would
recirculation occurs. help themselves by pointing
Using this approach, it is out to the procurement deci-
reasonable to calculate the sion-makers that they can
total airow in a new design have only a high degree of
by assuming an inlet tempera- condence in their data center
ture of 20C (68F) (low end designs for those servers that
of Thermal Guidelines) and adhere to the new publication.
a discharge temperature of <12 30.106 48.213 66.319 >84.425
As more customers ask for the
35C (95F) (maximum inlet Temperature, C information, more equipment
temperature that should be fed Figure 3: Rack recirculation problem. suppliers will provide it.
to a server through recircula-
tion) and the total cooling load of the room. A detailed design Summary
of the distribution still is required to ensure adequate airow The information discussed here is intended to assist data
at all server cold aisles. center designers in understanding the process by which the
thermal solution in the server is developed. Conversely, the
The Solution server thermal architect can benet from an understanding of
The link for information and what is needed for successful the challenges in building a high density data center. Over time,
design is well dened in Thermal Guidelines. Unfortunately, it is equipment manufacturers will continue to make better use of
only now becoming part of server manufacturers vocabulary. Thermal Guidelines, which ultimately will allow more servers
The data center designer needs average and peak heat loads to be used in the data centers with better use of this expensive
and airows from the equipment. The best option is to obtain and scarce space.
the information from the supplier. While testing is possible,
particularly if the owner already has a data center with similar References
equipment, this is not a straightforward process as the server 1. Processor Spec Finder, Intel Xeon Processors. http://processor-
inlet temperatures and workload can affect the airow rate. nder.intel.com/scripts/details.asp?sSpec=SL7PH&ProcFam=528&
PkgType=ALL&SysBusSpd=ALL&CorSpd=ALL.
Thermal Guidelines provides information about airow mea- 2. Beaty, D. and T. Davidson. 2003 New guideline for data center
surement techniques. cooling. ASHRAE Journal 45(12):2834.
The methodology of the German standard also can be used, 3. TC 9.9. 2004. Thermal Guidelines for Data Processing Environ-
recognizing recirculation as a potential reality of the design ments. ASHRAE Special Publications.
and ensuring discharge temperatures are low enough to support 4. Koplin, E.C. 2003. Data center cooling. ASHRAE Journal
continued computer operation. Finally, the worst but all-too- 45(3):4653.
5. Rouhana, H. 2004. Personal communication. Mechanical Engi-
common way is to use a historical value for T and calculate neer, M+W Zander Mission Critical Facilities, Stuttgart, Germany,
a cfm/kW based on the historical value. November 30.
In any case, the total heat load of the room and the airow 6. Verein Deutscher Ingenieure, VDI 2054. 1994. Raumlufttech-
need to be carefully considered to ensure a successful design. nische Anlagen fr Datenverarbeitung September.

42 ASHRAE Journal ashrae.org April 2005

You might also like