Data center owners and designers, Part 2: A checklist for an effective design effort
The owner's responsibility is to establish the requirements and make the decisions necessary to implement them. The design engineer is hired to develop solutions and provide hard data supporting the owner's decision making.
This is the final portion of a two-part series to provide proven methods for improving the working relationship between the owner and the design engineering team during a data center construction project. When this relationship is problematic, initial repercussions are in the traditional project elements of cost and schedule. Inevitably, the impact is in data center performance (i.e., it doesn't work), which results in career penalties and squandered investment of millions of company dollars.
The first part in the series ( December 2009, page 6 ) focused on the owner's responsibilities that can most significantly facilitate the design engineer's tasks. This article will include guidance to avoid the common shortfalls of the design engineer's topology solutions, as identified during Tier Certification reviews and the Accredited Tier Designer curriculum.
ESTABLISH A PROJECT LANGUAGE
During the development of design solutions—and necessary supporting data—the design engineer creates and then justifies a translation of the owner's requirements. The engineer defines terms, explains the practical application of industry standards, and complements the project owner's awareness of the performance impacts of topology and equipment selection. A transfer of knowledge and experience at the start of the project enhances project success and results in an informed client that is better able to take ownership of and operate the facility. The engineer of record should act as a “check valve” on inaccuracies or misinformation that results in inappropriate project expectations. The ability to address miscues in the early stages of the project was the impetus for the Uptime Institute to develop detailed course instruction on Tier topology for design engineers (Accredited Tier Designer).
An effective project language must consistently reference performance, redundancy, technology, ecological, and operations requirements. The effective engineer of record must employ the same project language down to the details on technical documents. Labeling should be developed for the data center owner/operator's ease and use, rather than for the engineering team's convenience. On a statistical basis, the majority of deficiencies the Uptime Institute recognized during its Tier Certification reviews pertain to a lack of coordination across engineering disciplines, including the inconsistent labeling of design elements.
These deficiencies often include unlabelled equipment, varying naming conventions (e.g., CHWP-A on mechanical and CHWP-1 on electrical), and contradictory equipment models and capacities. Clear communication of design intent is critical if the owner's requirements are to survive the contractor's “ways and means” prerogatives.
MODEL AN EFFECTIVE TEAM STRUCTURE
Attempts to communicate directly with each client stakeholder group can lead to conflicting or incomplete guidance. The lead engineer must set a functional example for the client to emulate, with a coherent structure and the fewest primary points of contact. This will help to ensure consistency across the multiple engineering disciplines and avoid over- or under-provisioning of specific aspects of the facility. Appointment of a lead engineer (or equivalent) ensures an integrated solution of consistently applied technical concepts.
One of the hallmarks of an effective team structure is a client that defines, commits to, and then abides by the project requirements. Should an issue arise that requires sign-off from the stakeholder, it is best managed on a point-of-contact basis rather than submitting every issue to open discussion. This does not limit full and informed discussion of project issues, but rather sets an expectation of accountability for the decision maker.
Changes to the project requirements must be documented with amendments formally acknowledged by stakeholders to track emergent solutions to the project requirements. Frequently, the composition of the project team changes, thus the only static perspective on the team is the requirements documentation. When a decision is challenged, the response must be founded on fact rather than subjective criteria or emotional responses.
Open brainstorming/exploratory sessions with all stakeholder groups need to precede the design working sessions and should conclude prior to schematic-level design. The design engineer's capability to develop effective or even responsive solutions can be adversely affected by disparate or unqualified guidance. Recognizing and coordinating client experts will help to prevent a situation in which, for example, an influential but inexperienced network architect drives a UPS topology working session. The design solution, schedule, and budget can suffer when the client team members advise the engineering team outside their core competency or purview.
For the same reasons that a diligent client invests in the requirements definition process, a responsive engineer of record is fluent in the unique project characteristics before initiating a design. It is not uncommon for design engineers to bring schematic-level design drawings to a requirements briefing. Despite the good intention of providing a teaching tool, this tactic re-orients the meeting objective and encourages the owner to remain soft on requirements definition. It also prematurely pressures the project, inviting guesswork and improvised solutions. In this scenario, requirements are not handed over to be incorporated into the design, but tweaked to fit the schematic design generated previously. (Note: This scenario is based on experience and is seconded by designers outside the Uptime Institute.)
USE SITE TOURS AS TEACHING TOOLS
Visit other sites to show the owner the variety of solutions that meet the project's performance objectives. Site tours help the design team to justify solutions as the solutions are demonstrated rather than touted. While site tours will incur travel and staff costs, this effort is nominal and prudent when weighed against a data center investment that will be measured in $10 million increments.
If possible, tour a number of sites:
Operational for 5+ years.
Try to observe solutions that have weathered extended operation. Explore sites beyond those that the project team has completed. As feasible, arrange to see sites in the owner's industry.
Site visits will introduce the owner to a variety of ways to implement Tier topology and Operational Sustainability concepts. Gallery cooling, or cooling units that are outside of a computer room in a perimeter corridor or equivalent, is a solution that typically is shrugged at on paper, but is compelling when seen firsthand.
Collocation providers and manufacturers use their site tours as a business development tool and allocate resources above a corporate-owned and -operated facility. Carefully contextualize showcase data centers that respond to a distinct performance objective. Script questions with the owner team before the tour and convene afterwards to discuss that data center's specific strategy. A facility designed to showcase a specific product or products has a different objective than a data center that supports functional business operations. Without context, a site tour may haunt the project.
MISCONCEPTIONS LEAD TO DEFICIENCIES
The following insights are based on experience with actual project designs in more than 25 countries that have been formally submitted to the Uptime Institute for Design Certification. Failure to rigorously attend to the fundamental concepts underlying the Tier topology standard can lead to squandered investment if the Tier standard was part of the project criteria. A single design can exhibit both excessive complexity and undersized capacity.
Project documentation for Tier IV often includes a stipulation for N+N (2N) or System + System (S+S) topology. S+S represents one solution for Tier IV, and arguably the most robust and most expensive, but is not required for Tier Certification. The fundamental concept of Tier IV is Fault Tolerance: the provision of “N” after any failure. Adherence to S+S does not ensure Fault Tolerance, which incorporates Continuous Cooling, Compartmentalization, and autonomous installed infrastructure response to an event. Adherence to N after any failure, rather than S+S, can offer significant cost savings just in terms of reducing the number of engine generators. (A common/single bus in the engine-generator plant is just as significant and common an underinvestment as excessive engine generators.)
Another common cause of overinvestment is confusion about the integration of the Uptime Institute's “Tier Standard: Topology” and TIA-942 . Uptime Institute Tiers are autonomous and have neither impact nor comity with the Telecommunications Industry Assn. (TIA). TIA's mirroring of the four Tiers can lead to significant confusion, resulting in excessive provisioning, per TIA's checklist, on projects with an Uptime Institute Tier objective. A frequent over-expenditure is 72 h of fuel storage, per TIA-942. Uptime Institute “Tier Standard: Topology” does not set an amount of hours for fuel storage. Operational Sustainability dictates fuel storage as a best practice to be defined by the owner and driven by site selection. If, for example, the client has a Tier III objective with an Operational Sustainability requirement for 12 h of fuel storage, then the fuel storage and distribution systems must meet Concurrent Maintenance requirements and still meet the 12-h requirement.
In RFPs, requirements documents, and project narratives, the project objective of “Tier III N+1” occurs frequently. The Uptime Institute recommends qualifying the Tier objective with its fundamental concept, such as Tier III Concurrently Maintainable and Tier IV Fault Tolerant.
An N+1 component count approach may be inherently problematic because (a) there is confusion about the definition of N and how much more to spend to reach +1, as evidenced by requests for clarification received by the Uptime Institute from project teams worldwide, and (b) there is no direct correlation between component count and Tiers.
The Uptime Institute recommends the following definition of N: The critical load or the equipment required to serve that load. Misunderstanding of component count language on the owner's team can lead to erroneous guidance to the design engineer or a disconnect between expectations and the topology solution.
Continuing with the example of Tier III N+1, Tier III topology requires a minimum of N+1 components to achieve Tier III. Yet, a strict adherence to N+1 criteria ensures nothing beyond Tier II; it meets the fundamental concept of Redundant Capacity Components.
Tier III incorporates all the requirements of Tier I and Tier II. So Redundant Capacity Components is a minimum, but this project definition distracts from the fundamental concept of Concurrent Maintenance. Concurrent Maintenance is not ensured by component count. A Tier III compliant configuration allows for maintenance, repair, or replacement of each and every capacity or distribution component, without affecting the critical load. Concurrent Maintenance cannot be obtained by component count because Concurrent Maintenance is often attained out of plain sight in the form of pipes, pumps, and parts.
It is difficult to reconcile the Tier consequences of specific subsystems with a checklist approach to a design solution. For example, both the makeup water source and engine-generator ratings have unique Tier requirements considerations that mandate special attention. The Uptime Institute clarifies the Tier topology consequences of the engine-generator and makeup water subsystems in its Accredited Tier Designer Technical Paper Series. In regard to engine generators, N+1 does not ensure continuous runtime capability.
The number and placement of chilled water system valves are a common Tier shortfall. The “each and every” dictum of Tier III and IV applies to all mechanical components—valves as well as cooling units. This leads to a number of valves that goes against certain perspectives.
The widespread faith in the reliability of the valves in facility cooling systems is undeserved. Routinely, data center operators have addressed a leaking or failed valve, even with industrial-rated valves. If cost is the driver to reduce the number of valves, can the data center operation afford a leak, flood, or outage for want of a single valve?
Valves are typically avoided whenever possible in nuclear power plants, refineries, and marine vessels. Because there are no takeouts or dry docks for a Concurrently Maintainable (Tier III) or Fault Tolerant (Tier IV) data center, the facility must be rebuilt without affecting the critical load. It is important to note that for Tier IV, all Tier III requirements apply, in addition to the requirement of autonomous operation.
One of the leading causes of design shortfalls in facilities with a Tier objective is reference to hybridized Tier documents. Often, these documents attempt to mold the Tiers into a checklist or cookbook. In order to facilitate practical application of the Tiers, the Uptime Institute published “Tier Standard: Topology.” This provides documentation on Tiers requirements and supersedes previous white papers. Additionally, the Uptime Institute publishes the Accredited Tier Designer Technical Paper Series to clarify the Tier consequences of selected subsystems. Disregard and discard knockoffs at first contact.
Finally, a prevalent misconception is that the Tiers are bought, i.e., compliance is ensured by the enormity of the investment. As illustrated by the above common shortfalls, Tier compliance is met through the rigorous and consistent application of the fundamental concepts of the Tiers in all relevant subsystems. A similar rigor and consistency both in team structure and communication with the owner will ensure that the designers maintain focus on the development of innovative and responsive solutions.
Kudritzki is vice president of Uptime Institute. He has managed the expansion of the Tier Program into South America, Europe, the Middle East, and Africa. He continues to ensure the integrity of the Tier Program Certifications worldwide, and has assisted in the development of numerous Uptime Institute technical papers and educational programs, including the Accredited Tier Designer curriculum.
The content of this article is informed by the collective resources of Uptime Institute Professional Services: field experience as Owner's Representative and Tier Certification Authority on high-availability projects worldwide; consultants' personal data center development and operations experience; real-time, real-life reporting of Site Uptime Network members; analysis of the Uptime Institute's Abnormal Incident Report database; and the professional engineers attending the Accredited Tier Designer curriculum.