Data centers for Uncle Sam
Growing computing needs have forced many government agencies into a corner they did not foresee.
Growing computing needs have forced many government agencies into a corner they did not foresee. From simple storage and backup to enterprise computing and research, the infrastructure to support the required flow of information is either missing or insufficient.
As the people, platforms, and equipment continue to proliferate, like any other underpinnings of everyday life, the infrastructure must be expanded to accommodate such an increase. The ever-increasing demand for scalability has caused the mission critical and data center industries to redefine the technology refresh cycle as a 8- to 24-month period. However, most government facilities have not received a significant upgrade in more than 8 years. Faced by a need to upgrade, rehabilitate, or completely rebuild and transform their data centers, many government agencies find themselves in an unfamiliar world of changing technologies and confusing terminology.
A perpetual tug of war
One of the most unusual aspects of data center development is the unique way in which the program, funding, and interested parties come together to define their needs and go about the process of addressing them. So what is a data center, exactly? Is it a building, a utility, or simply a node on the information superhighway?
The complexity involved in engineering a data center compared to other government facilities is often underestimated. Data centers typically require an overall available facility power of at least 5 MW, and the power density can exceed 200 W/sq. ft. Often referred to as a fourth utility by industry professionals because of their role as the heart of a data hungry circulation system, data centers are becoming as critical to our day-to-day lives and national defense as power plants, nuclear reactors, and drinking water.
What makes understanding the role of data centers even more complex is their unique relationship with the movement, storage, and protection of information. One way this peculiar understanding of data centers gets manifested is in the way they are procured, particularly in the public sector.
The complications arise because of the relationship that exists at the crossroads of their two most interested constituencies—information technology (IT) and facilities. In most government agencies they are the major recipients and, therefore, the major competitors for precious capital funding. The battleground where these two often have competing interests is in the data center. Is the data center a building or facility, as the design and construction people will argue? Or is it part and parcel to the great IT continuum, as the CIO and the IT director see it? In reality, it is neither and both, and has been one of the major contributing factors to the way data centers are developed in the government arena.
At one end of the spectrum, are the typical design-bid-build and design-build projects. From a facilities perspective, these are fairly straightforward. Where they suffer is in development time from programming to design, and from construction through commissioning, and ultimately migration of IT assets. In the government sector, this process can often take 3 to 8 years or longer, depending on the procurement methodology, funding, and approval stream. Understanding that today’s data centers go through technology refreshes at a pace of every 18 to 24 months, one could be three to five cycles away from the then-current logic on data center design from the time a project is conceived to the time it is actually operational.
Today’s data centers typically employ at least 36 in. of raised floor, overhead cable management, and even such revitalized ideas as water-cooled IT racks (thought to be blasphemy just a few years ago). However, should one have employed one of these more traditional government procurement processes, they might find a newly completed data center today with just 12 in. of raised floor for the now rarely employed under-floor cabling) and an energy plant sized to accommodate 75 W/sq ft. As with other forms of computer technology, almost 75% of data centers that are now only 10 years old are projected to be replaced entirely.
But how do government agencies deal with brand new data centers that are almost or already obsolete? A second dynamic in the government data center development cycle, particularly in these more traditional project delivery systems, is the separation between “shell and core” and “fit-up.” Not an uncommon concept to those familiar with speculative office buildings or retail development, this methodology leaves the expertise on each side of the equation with its corresponding department. The building “shell and core” gets designed and built under the direction of the facilities staff, using architects, engineers, and contractors. The IT space then gets “fit-up” by large-scale IT consultants like technology companies, systems integrators, or defense contractors.
Often when the second group gets on-site, they realize how under-provisioned the facility is because it was really designed for a technology three generations removed from the present. These IT consultants must now embark on retrofitting the facility to its current requirement, and the government agency has to tap both sides of its precious capital outlay—facilities and IT—to overlap services, extend the process, and finally get into a working data center.
To avoid this redundancy in services, and to reduce the time from conception to reality, many are now turning to more streamlined public-private ventures. This is particularly true in cases where capital funding doesn’t flow in a timely manner and ideas like a long-term lease, collocation, or even full-on hosting or outsourcing become an attractive alternative. Once the government users have gone beyond their issue of how to get a data center, the next decision-making and policy overlays are there to confound them further. How will we pay these extraordinary power bills and meet our mandates for a “greener” society? How, too, will our information remain safe from natural disaster, deliberate forms of criminal activity or terrorism, the inevitable shortfalls of the systems that compose the data center, and the people that operate and maintain it? Let’s dig deeper.
What is reliability?
The strict requirements of operational conditions imposed by specific government operations add more facets to the equation. These conditions are related to the security umbrella encompassing our nation’s activities at home and abroad. Talking to a data center facilities manager about the concept of reliability typically yields a variety of responses indicating that there is a basic lack of understanding of not only what it is but how to effectively apply the concept. Note that reliability here is referenced with a small “r.” It indicates a probability of occurrence numeric, whereas reliability with a capital R is an umbrella term addressing random-access memory.
So just how important is reliability? Organizations such as IEEE, NFPA, and the U.S. Army Corps of Engineers, have created a variety of documentation defining, describing, and qualifying reliability as an important concept in the daily operations of government facilities. IEEE has devoted a book to reliability, called the “Gold Book,” in which it introduces basic concepts and applications supporting the application of reliability.
For the first time in Article 708, the NEC is beginning to address the reliability issue. This is the result of the Dept. of Homeland Security’s requests to address critical operation systems (COPS) and how to survive both natural and man-made disasters. The U.S. Army Corps of Engineers Power Reliability Enhancement Program (PREP) has produced more than 15 army technical manuals supporting the reliability concept for government operations. The list goes on, but one can see the realization of applying reliability is gaining interest.
The biggest problem with addressing reliability is terminology and definitions. Reliability is possibly the most misused term in the facility world. So then how does one go about discussing reliability issues among people with different understandings of the word?
The components of reliability are statistically based and include reliability, availability, and maintainability. It took several years to develop, test, and verify these concepts with positive results realized in the weapon system community.
In later years, the government made it the commercial community’s responsibility to grow and propagate these concepts and eventually abolished most of the Military Specifications and Standards. As the facilities community matured, it began to grasp these concepts and apply them to a daily regiment leading to a resurgence of reliability. This resurgence in reliability concepts also is fueled by the green movement, which is helping to design and operate facilities with the smallest carbon footprints possible.
Strategies for success
The opportunity for designing for high availability and reliability is greatest when designing a new facility. By applying an effective reliability strategy, modeling the systems, designing for maintainability, and ensuring that manufacturing and commissioning do not negatively affect the inherent levels of reliability and maintainability, a highly available facility will result. A reliability strategy describes how an organization approaches reliability for all systems and services it develops and provides to its customers. The strategy can be considered as the basic formula for success, applicable across all types of systems and services.
A reliability program is the application of the reliability strategy to a specific system or process. Each step in the strategy requires the selection and use of specific methods and tools. For example, various methods can be used to develop requirements or to evaluate potential failures.
Developing requirements, translations, and analytical models can be used to derive requirements. Quality function deployment is a technique for deriving more detailed, lower-level requirements, beginning with customer needs. It was developed originally as part of the total quality management movement. Translations are parametric models intended to derive design values of reliability from operational values, and vice versa. Analytical methods include thermal analysis, durability analysis, and predictions.
When it comes to facility O&M, every facility asks the question, “How can we be more efficient?” Modern private and government facilities have begun to recognize that a strict manufacturer’s recommendations for tasking and tasking intervals may not be the best bang for the buck. This is partly because a manufacturer’s recommendations do not account for varying or site-specific dynamics and operating conditions.
Reliability-centered maintenance (RCM) analyses seek to rank facility equipment by their presumed operational risk. This rank is achieved by considering reliability metrics and the importance of equipment function to overall facility operation. These equipment ranks and reliability metrics serve to assist facility management in making well-informed and effective decisions pertaining to the overall maintenance program.
Specifically, an RCM analysis can bridge the gap between a manufacturer’s recommendations and a plant’s operating conditions. Sites performing RCM have reported 14% to 50% of direct cost savings to their maintenance program. These cost savings are achieved by minimizing unnecessary maintenance tasks and concentrating the effort on equipment that poses relatively high operational risk. This more educated maintenance focus also has bearing on indirect maintenance costs such as decreased maintenance-induced failures, breakdowns and downtime, and repair material requirements. Overall, RCM analysis provides the tools and information necessary to realize decreased maintenance costs, while continuing to maintain facility operations.
Reliability-based maintenance is the marriage of RCM with supervisory control and data acquisition (SCADA). SCADA is the backbone of controlling the operation of a facility. There are several SCADA systems available to do the job, depending on the size and activity of the facility. Combining these two tools will help the facility not only maintain its mission operational capability, but also possibly improve its reliability profile. The old terminology for this is reliability growth. This is a powerful concept to control the cost of maintenance and to directly address the downtime requirements.
Yet another important issue to keep in mind is the security of the facility. The Anti Terrorism Force Protection (ATFP) and the Unified Facilities Criteria (UFC) 4-010-01 Dept. of Defense Minimum Antiterrorism Standards for Buildings is the most widely referenced ATFP criteria publication. It’s the product of a significant multi-agency effort to seek effective ways to mitigate terrorist threats. Its contributors include the Undersecretary of Defense, the Joint Chiefs of Staff, the U.S. Army Corps of Engineers, the Naval Facilities Engineering Command, and the Air Force Engineer Support Agency. The most recent publication, dated July 31, 2002, was preceded by Interim Dept. of Defense Antiterrorism/Force Protection Construction Standards from Dec. 16, 1999.
For those who started ATFP-related projects prior to the most recent publication date, it’s important to determine which set of criteria to reference because most projects initiated before the start of government fiscal year 2004 fall under the older standards. UFC 4-010-01 is a living document and its contents are updated as events and new information dictate. Dept. of Defense security personnel should regularly check for updates to ensure they are referring to the latest standards. It’s also important to note that the word “minimum” is part of the standard document’s title. The level of protection required for any particular facility must be assessed and defined by the installation commander or an equivalent authority in charge of the mission.
Careful attention must be paid to the stated intent of UFC 4-010-01. To paraphrase, it states that the standards are intended to minimize the possibility of mass casualties in Dept. of Defense facilities by establishing a minimum level of protection where no known threat of terrorist activity currently exists. The philosophy that underpins the criteria is based on value. It supports the argument that it would be cost-prohibitive to design facilities that address every conceivable threat, but instead that it is possible to provide an appropriate level of protection for all personnel at a reasonable cost.
The concepts and strategies for effective antiterrorism and force protection measures are varied and extensive. Dept. of Defense security personnel and their colleagues in similar government roles should, at the very minimum, have a grasp of the subject. A clear understanding of how ATFP differs from “all threats” force protection is critical, as is the relationship of possible issues and mitigation strategies that span site planning, structural design, architectural design, and critical infrastructure design.
The concept of designing, building, and operating the most efficient, cost-effective, and safe data centers requires a true understanding of the tools available in all the varying stages. The challenge is to design for the future, which incorporates anticipated IT growth and managing the power consumption, while maintaining a mission-capable, reliable facility.
Energy efficiency and sustainability
Though the government’s role in protecting our planet’s natural resources has become much more pronounced in the past 20 years, it is amazing how quickly these forces have converged on the emerging problem posed by the proliferation of data center development.
The U.S. Environmental Protection Agency (EPA) reports that 1.5% of the country’s emissions come from data centers, a number that has doubled from 2000 to 2006, and is on track to surpass the airline industry by 2020. Not only are the EPA and other federal agencies like the Dept. of Energy (DOE) and the General Services Administration (GSA), focused on data center development, but programs like Energy Star, U.S. Green Building Council LEED, Green Globes, and Green Grid have illuminated the gains that can be made in data center energy usage and the efficiency of such facilities. Three programs deserve mention and are part of the EPA and the DOE’s combined National Data Center Energy Efficiency Information Program.
• The Energy Star Program establishes a rating system that has been applied across a broad spectrum of commercial buildings. On a scale of 1 to 100, buildings must obtain an energy rating of 75 or above to receive the same sort of Energy Star label for superior energy management that has been applied to appliances like refrigerators and washing machines. So far, less than 7% of more than 60,000 commerical buildings have been able to meet this standard. Since data centers, however, have an entirely different energy-use profile, the EPA, along with the DOE, the Lawrence Berkeley National Labs, and several industry partners, developed the DC Pro software tool to measure energy effectiveness specifically in data centers. Although this tool is currently in the historical data gathering state, it is expected to be rolled out after a performance model can be developed on material gathered from 100 national data centers (
• The DOE’s Save Energy Now is aimed at continual energy improvement across a data center’s development and management over time. In addition to its commitment to the DC Pro tool, Save Energy Now also is putting emphasis on data center assessments, training, certification, and best-in-class guidelines and lessons learned that can be applied across the entire data center, as well as its systems and subsystems (
• The Federal Energy Management Program (FEMP) specifically works through and with federal agencies like GSA and the Office of Management & Budget to promote DC Pro and other data center energy efficiency workshops, surveys, information, alliances, and recognition/awards programs. As federal agencies go through their own data center transformations, FEMP acts as their voice when working with Congress and the executive branch to promote policy and disseminate tools and resources that reduce the cost and environmental impact of energy use.
The industry is reacting to these initiatives in large measure through the development of new technologies and practices in the data center space. These innovations have come from such diverse directions as server optimization/virtualization; dc power (as opposed to ac power) direct to the rack and component; alternative power generation, like geothermal and fuel cells; and optimized cooling systems, like liquid cooling and “smart” sensors.
Coyle is vice president of federal programs for EYP Mission Critical Facilities, a company of HP, in its Washington, D.C., office. A seasoned professional with more than 20 years of experience in government contracting and procurement, Coyle has managed and led efforts on vertical construction on larger public works public-private partnership initiatives. Arno is principal and director of the C4ISR Group in the Utica, N.Y., office of EYP Mission Critical Facilities. A member of the IEEE and chairman of the Gold Book, Arno’s responsibilities span program management, electrical, and mechanical system analysis and modeling, and data collection and analysis.