Commission Critical

Editor's Note: In this issue, we begin a new approach to our "Professional Practices" column. The topic addressed each month will be rotated within four broad categories of coverage: management and engineering practices; information technology; legal issues; and commissioning. Commissioning is a vital part of all building systems design.

By Roy H. Feinzig, P.E. and Bruce Storms, Principals, EYP Mission Critical Facilities, Inc., New York February 1, 2003

Commissioning is a vital part of all building systems design. But mission-critical facilities (MCFs) require their own unique commissioning procedures and schedules.

Testing must be performed not only in normal operating modes, but also in all possible failure modes to assure seamless transition to backup and redundant systems. Moreover, total system integration testing is essential for ensuring that the electrical and mechanical infrastructure will perform according to design intent and the owner’s operational needs.

For a thorough commissioning of an MCF, the cost can seem high compared to traditional facility commissioning, but the cost of system failure that can result from improperly commissioning these facilities could be devastating. The table on the opposite page provides average hourly downtime cost for various type of mission-critical operations.

So, what makes a building or system mission-critical? It’s when even a momentary failure of a system, process or function can affect the entity’s operations.

Classic examples of MCFs are data and telecom centers, financial institutions, pharmaceutical labs and continuous-production or -process facilities. But almost all buildings today use one or more systems or processes that can be considered mission-critical.

An MCF doesn’t necessarily operate 24/7. For example, the New York Stock Exchange is mission-critical, but generally speaking, only during trading hours. Similarly, it is not necessary for an entire facility or complex to be critical for that facility to serve a mission-critical function. For example, centralized ATM processing may take up less than 10% of a total banking facility’s resources. But if ATM processing is critical to the bank’s business objectives, then that facility serves a mission-critical function.

Focus on reliability

The benchmark for mission-critical functions is 99.999% power reliability, which translates into just over five minutes of unscheduled downtime in a year. Commonly referred to as “five nines,” this degree of reliability cannot be achieved without multiple levels of power system redundancy.

And redundancy requires additional equipment running idle at all times, so that if any part of a system fails, redundant equipment seamlessly assumes the load, or loads, of the faulted equipment. But extra equipment running all the time means added operating costs, thereby making energy efficiency a conflicting interest in an MCF.

While consideration should be made in choosing equipment that is efficient, there is no way that an MCF can be commissioned with energy efficiency as a primary focus. Also, issues such as IAQ and employee comfort are concerns in certain types of MCFs—such as trading floors and financial institutions—but they are secondary concerns to reliability.

Mechanical systems

The environment of a data center must be closely controlled, much more than for “people space.” Data center equipment can be adversely affected by temperature and humidity extremes and, more importantly, by the rate of temperature change. Building owners and computer equipment manufacturers have established standards for temperature and humidity ranges, as well as for allowable rate of change, typically 3

The building automation system (BAS), which typically controls the sequencing of all HVAC equipment, must be tested for correct operation of every control point and sequence in all possible operating conditions. Proper alarming and seamless transfer to redundant equipment must be verified.

Electrical systems

Failure of a mechanical system often allows sufficient time for facility engineers to take necessary corrective actions due to the relatively slow rate of change in environmental conditions. However, a power interruption of greater than 8.83 milliseconds to critical loads can result in shutdown of, or damage to, sensitive computers.

Consequently, MCF commissioning focuses heavily on the electrical power protection systems. Power quality is vital. Most mission-critical applications require high quality uninterrupted power to the critical loads during all operating conditions. Typical utility reliability is approximately 99%—far too low for an MCF. To increase the overall system reliability, additional power protection equipment is required.

The components of a typical MCF power protection system include an uninterruptible power supply (UPS), the engine generator system, automatic transfer switches and distribution systems. These components within the systems must be commissioned individually, then commissioned again as a fully integrated critical-power system.

Full-load system testing is performed on the entire facility to simulate the full-load operating and failure mode conditions to which the facility will be subjected. This should not be confused with HVAC capacity testing.

Reliability assurance testing—also known as continual commissioning —of major components of the critical power system is necessary to identify component degradation. It is recommended that continual commissioning be performed on a semi-annual or annual basis depending on the equipment type.

Commissioning for an MCF’s electrical power system also requires specialized monitoring and measurement equipment. Because the flow of electrons is far more discrete than the flow of air or water in mechanical systems, the electrical systems require more sophisticated test instruments to ensure that major assemblies—and even subassemblies—of each piece of the critical power system are operating properly.

Typically, instruments used by the equipment vendor are inadequate for measuring and recording high-speed electrical transients. The use of high-tech instrumentation and metering by an independent, third-party commissioning agent provides an unbiased assurance that all systems are operating properly.

Proper verification of equipment operation often involves duplicating many portions of the vendor’s equipment start-up/site acceptance testing activities, which results in additional site time being expended on the project. But given that the cost of system failure at MCFs is abnormally high, as are the large number of load threatening deficiencies and failures that have been identified during past MCF commissioning projects, it is evident that the benefits of duplicating some vendor activities more than justify the additional cost.

Acceptance testing

The commissioning agent, as part of the site acceptance testing phase of commissioning, must verify the major functions of every piece of critical equipment, such as electrical, HVAC and fuel oil. Again, it must be stressed that commissioning of any facility includes acceptance testing in all normal modes of operation, but a design intent that achieves five nines incorporates multiple levels of component and system-level redundancy. Failure mode testing is necessary when attempting to prove that this design intent is met.

When an MCF is a 24/7 operation, it is imperative that all possible operational scenarios are covered during the commissioning process.

However, once such facilities are operational, there is substantial risk associated with failure-mode testing. Every potential point of failure must be examined for every piece of equipment and every system in the facility. Then a detailed test procedure must be written for each scenario to ensure that the design intent is truly met.

Integrated testing

All problems identified during the site acceptance testing phase of commissioning must be resolved or repaired before the commencement of integrated testing. Integrated system testing of all failure modes must be performed while the facility is operating at the full electrical design load, which can become extremely costly. However, the quantifiable prevention of failure, as well as the peace of mind achieved by physical verification of all operating modes, more than justifies the additional costs involved.

As for integrated testing of the mechanical systems, keep in mind that it is not possible to achieve “design day” conditions—in other words, the worst case outside temperature and humidity at full electrical loading—during the commissioning process.

The desired environmental trends for the integrated testing should be reviewed with the BAS vendor prior to performing the test to verify the BAS is capable of recording the necessary information, displaying it in charts and providing an electronic copy of the results in a spreadsheet-compatible format for further analysis.

Typically, the BAS is capable of charting trends in space temperature and humidity as well as supply and return water temperatures for chiller plants and supply and return air temperatures for air-handling equipment. Data loggers should be used if the BAS cannot provide all of the desired information.

Integrated testing is a cumulative exercise to prove the reliability of the overall design and compatibility between all critical systems—electrical, mechanical and environmental. The intent of such testing is to simulate real-life conditions that the facility could be subjected to on any given day during operations. The best method to verify total facility integration is to simulate a full building design load, remove the building utility source—pull the plug—and monitor all systems as they respond. Resistive load banks, which convert electrical energy into heat, are used to produce this load. Integrated testing is the only way to demonstrate that all critical systems function collectively.

Of course, the bottom line in such efforts is to prevent equipment downtime, the effects of which should be spelled out for the client. Revenue loss may take a number of forms:

Direct costs. Potential loss in direct costs includes personnel productivity loss, equipment damage and data destruction, and residual revenue losses, including compensatory losses, lost future revenue, billing losses and investment losses. Keep in mind the table on page 21 is taken from a 1996 survey. Costs today would be exponentially higher. A more recent survey by the same firm showed 54% of respondents reporting downtime costs in excess of $50,000 per hour, with 4% reporting downtime costs in excess of $5 million per hour.

Compensatory damages. Downtime costs for a financial company can be enormous. For example, a company in the World Financial Center had a single server overheat and fail. This server was dedicated to a client who had a guaranteed service agreement with a compensatory damages clause of $1 million per day. The server could not be repaired, and at the time of failure, replacement lead-time on that particular model was several weeks.

Financial performance. An Internet auction company had several outages over a three-day period. The third time their web site went down—a 22-hour outage—their stock price dropped $47.50 per share, a reduction in market capitalization of $6.1 billion. Although their stock price later rebounded, they were forced to refund $3 million to $5 million to customers who had listed goods on the site before it crashed. To protect against future failures, the company invested $14 million in disaster recovery technologies.

Reputation damage. In the days following the Internet auction site’s failure, volume on their two main competitors’ sites increased by 50%, obviously at their expense. A single service interruption can cause a long distance carrier to lose several hundred thousand service-subscribers to their competitors. Reputation damage from a failure can extend to a company’s credit rating, suppliers, banks, business partners and investors. Reputation damage has the capacity to exceed revenue losses—possibly to the extent that the losses are unrecoverable.

Continued recovery time. When power is restored after an outage, there may be considerable time expended replacing damaged hardware, recovering lost data and restoring software. Hours may be spent waiting for the maintenance personnel, and replacement of damaged equipment could take weeks, adding significant downtime beyond the time expended simply restoring power.

The value of up-front commissioning of mission-critical operations, in both full operational and testing-to-failure modes, cannot be denied. It is better to spend extra time and money on commissioning than to suffer from financial and reputation damage.

Table – Cost of Downtime

Type of Business
Hourly Cost of Downtime

Brokerage operation
$6,450,000

Pay-per-view services
150,000

900 phone number services
54,000

Airline reservation centers
89,500

ATM service fees
14,500

Catalog sales centers
90,000

Cellular service activation
41,000

Credit card sales authorization
2,600,000

Home shopping channels
114,000

Infomercial 800 phone number promotions
200,000

Package shipping service requests
28,000

On-line network connection fees
25,000

Telephone ticket sales
69,000