How to predict electrical equipment failures

Electrical engineers can provide many services to help facility managers predict electrical equipment or power distribution system failures.

By Freddy Padilla, PE, ATD; Page, Austin, Texas August 22, 2017

Learning objectives

  • Identify the top electrical and power equipment in a facility that needs an overhaul or replacement.
  • Explain how to calculate the reliability of an electrical system.
  • Evaluate infrastructure, budget, and other needs to provide facility managers with appropriate electrical and power systems.

Problem: Facility managers with aging power distribution systems often lack comprehensive data to effectively communicate to managers responsible for budgets and allied stakeholders the need for equipment overhaul or replacement. The gap in communication prohibits facility managers’ ability to influence budgeting in a timely manner when significant failures are avoidable.

Main point: Consulting engineers can provide a variety of services to assist facility managers in predicting equipment failures and communicating the impacts of the failures to other stakeholders (see Figure 1). Consulting teams also can help develop emergency operating procedures in the event of equipment failures and long-term equipment-replacement plans, both of which lead to better budgeting with fewer surprises.


There are many tools on the market that calculate the reliability of a system. However, it is important to understand their limitations. Most of the tools only calculate reliability for single-source systems, thus additional calculations are required for the assessment of multiple-source systems. Another challenge with the tools is that information such as mean time to repair (MTTR) is based on IEEE 493-2007: IEEE Recommended Practice for the Design of Reliable Industrial and Commercial Power Systems, which in turn is based on a survey that covers a wide range of industries. Often, companies employing these standard tools find that MTTR actually takes longer than the time listed in the IEEE reference due to requirements such as root-cause analysis and safety reviews. The use of the IEEE reference will result in higher reliability values, possibly affecting the equipment-replacement decision process. Establishing detailed procedures to remove equipment for repairs or maintenance can significantly reduce MTTR and increase system reliability.

Reliability-centered maintenance

Reliability-centered maintenance (RCM) refocuses analysis of a maintenance and replacement program to overall impact on operations, as opposed to the more traditional approach that focuses on equipment individually (see Figure 2).

Basically, an RCM program asks whether a given maintenance activity will introduce more risk than it mitigates. There always is an opportunity for failure in the process of performing maintenance or equipment replacement: A bolt may not get retorqued, a wrench might be left inside the switchgear, or a firmware update may have a bug in it. In addition, there often are business costs associated with maintenance, even if it goes perfectly—this is especially true for systems with a single power distribution pathway.

RCM programs are common in industries with extreme consequences in case of failure, such as aircraft maintenance. They are less common in power distribution systems for buildings and campuses. But RCM programs are arguably appropriate for all facilities that have far-reaching impacts on the communities or organizations they serve.

To create an RCM program, a consultant typically starts by developing standard maintenance procedures for each component of the power distribution system. The next step is to calculate the system reliability based on current procedures. This analysis helps quantify the implementation of the existing procedures. The consultant then develops a failure modes and effects analysis (FMEA) for the power distribution system. The analysis determines the possible failures that could occur, the likelihood of their occurrence, and the impact each specific failure would have on the overall facility. If enough stakeholders participate within the facility owner’s organization, the analysis can expand to include impacts to the broader organization beyond the individual facility under consideration. Typically, the people involved in the FMEA are the people executing the work, and they generally focus on failures that relate only to their systems. However, when multiple stakeholders, such as information technology equipment owners or managers, are engaged in the process, the evaluation becomes more comprehensive and more effective in helping to identify the possible impacts of failures. This communication enables the creation of broader mitigation efforts and project management.

After the FMEA is complete, the consultant can provide guidance regarding which maintenance activities bring value and which ones don’t. The results are often counter-intuitive to facility managers with many years of experience—especially within organizations that treat individual equipment failures with the same level of fear that they would a complete system outage.

Thus, it is very important that the consultant asks for buy-in from the facility manager at each step of the FMEA. Often, the FMEA demonstrates that it is better to run a particular piece of equipment to failure rather than proactively maintain or replace it. The results vary dramatically based on system configuration, the business impact of the facility, and the reliability of the maintenance program.

After an RCM system study is complete, a facility manager can create an RCM-based maintenance program. A comprehensive program provides written standard operating procedures, maintenance procedures, and emergency operating procedures. It also outlines a development plan for training maintenance staff and a 10-year maintenance and equipment-replacement plan including projected budgets. Finally, it identifies a plan for future system improvements where desired to increase system reliability.

It is critical that the consultant clearly documents all underlying assumptions used in an RCM system analysis. For example, a primary struggle with equipment replacement is how to deal with outages. This is particularly difficult in health care where outages require relocation of patient care. Temporary power costs can equal or exceed the cost to replace the equipment. Replacement projects also will require design professionals for sealed drawings that the local health department or fire marshal can review, which also increases the cost of replacement as an entire project cost. Projects that require competitive bidding for construction work also will undergo a design process and have associated engineering fees.

Executives often choose not to participate in the process of creating an RCM program. They also may not accept the results if the study recommends a higher-than-expected expenditure on maintenance.

Reliability studies depend heavily on the assumptions made about the future reliability of equipment and maintenance activities, and it is easy for those who do not participate in the assessment of these numbers to disagree with the results. A technique for eliminating possible disagreements is to analyze the sensitivity of each assumption and identify how much change would be required to each assumed value before it would affect the outcome of the study. This additional analysis work can then focus subsequent conversations on the assumptions that have the most impact on the study.

Data

Analyzing existing systems and determining where to collect data is very important. Electrical distribution equipment has many components that can contribute to the system reliability. The components can be evaluated as a system or as individual elements. Many owners add additional subsystems, such as maintenance modes, power-quality meters, high-resistance grounds, and zone-selecting interlocks, to the electrical distribution system. Where the equipment is located, such as indoor/outdoor, and proper humidity and temperature controls should be taken into consideration. All these additional components contribute to system reliability and affect the equipment life, equipment mean time to failure, mean time to repair, mean time between failures, and failure rates.

It is common for owners to keep track of information related to reliability to operate their facilities. Owners are generally interested in logging equipment failure, preventive maintenance, hours of use, the number of operations, thermal-scan inspections, torque logs, environmental conditions, etc. There are many ways to capture and record this type of information, such as through the use of data center infrastructure management, facility management control systems, custom databases, or simple spreadsheets.

Across the industry, trends illustrate that all this information is collected but never analyzed to determine the status of the electrical distribution system. Having a single source of information for all data helps the process, but it is essential to look at the overall distribution system, not only individual components, to get a good picture of system status. Establishing system metrics of the complete electrical distribution system including all subsystems and documenting system failures is fundamental and necessary in this process.

Another important factor to consider is that often the additional subsystems are not properly maintained and become a burden to the overall system reliability. A good example of this industrywide practice of neglecting subsystems is the failure to use and maintain maintenance-mode features on electrical switchboards, which often generate failures simply because the internal wiring has not been checked or used. A good way to resolve these issues is to implement a detailed maintenance procedure for equipment and ensure that all subsystems are tested or removed if they are not needed, albeit an experienced consultant might be required to distinguish malfunctioning and unnecessary systems.

How much do I really need?

For facility managers seeking approval to spend money on necessary equipment overhaul or replacement, the substantial cost of a complete system analysis and development of an entirely new approach to maintenance is often unrealistic and unnecessary.

For example, when owners of mission critical facilities request assistance with identifying the data that will help them develop a better understanding of equipment reliability, they are looking for information that is specific to their organization’s power distribution design and existing maintenance procedures. The challenge is to decide when it is time to replace the equipment, considering that the equipment has been maintained through the years according to manufacturer’s recommendations. The goal is to replace the equipment before failures occur. Another challenge is documenting the return on investment for maintenance and replacement. Tools such as reliability analysis and more in-depth maintenance procedures can help determine when replacement is required and also include costs incurred by an equipment failure and its associated maintenance/replacement.

Owners of facilities built more than 25 years ago who are confronting renovations, additions, or additional capacity requirements realize they must replace much of their power distribution infrastructure. However, they need help determining which equipment to replace first and how to perform replacements with minimal business impact.

Facilities with comprehensive monitoring systems already are collecting data. But often, owners do not have the ability to use the data to predict when equipment reliability will drop significantly or effectively communicate the meaning of the data to less technical stakeholders.

There are various software tools that facilitate data collection and clearly show how the data can be used to improve communication with those responsible for budgets. Features of these tools include:

  • Repair, replace, and switching options are automatically evaluated.
  • Reliability indices are reported at each load and each bus.
  • IEEE indices are reported at each protective device.
  • Options to include or exclude switches, fuses, and load reliability in calculations
  • Option to repair or replace transformers
  • Cost evaluations for standard utility-supply configuration alternatives
  • User-defined weighting preferences for operation, reliability, maintenance, recovery, and cost factors
  • User-defined aging factors for failure rate and repair time
  • Reliability reports include mean time to failure, failure rate, MTTR, and annual outage.
  • Cost-evaluation reports include equipment lists and costs for utility and distribution systems.
  • Cost-evaluation reports include summary values based on user-defined weighting factors.

Looking for ways to predict failures

RCM is an old practice that has been used by the military for many years. Today, the reliability of electrical distribution equipment is making owners consider this kind of methodology to minimize system failures. The industry is looking for ways to predict failures in existing equipment and to understand the impact of the failures in day-to-day operations. Many tools in the market can help quantify risk, but not all of them can accurately predict what will happen to a system without the correct data. This data should be collected specific to how the owner operates and maintains the facilities. The data-collection effort, which is very time-consuming, must be gathered from multiple stakeholders within the organization to provide the most effective foundation for analysis and planning.

Many facilities are realizing the importance of the electrical distribution system and the need to minimize failures, and for that reason, new electrical distribution systems are being designed with fewer components and with only the essential features that a client needs. Today’s approach is moving to quantify risk with reliability tools, such as the one provided by IEEE 493, and to reevaluate infrastructure to simplify and minimize components in the electrical distribution system.


Freddy Padilla is principal/mechanical, electrical, plumbing engineering director at Page. He earned his Uptime Tier Designers accreditation in 2013. His experience includes a variety of engineering services for mission critical, science and technology, industrial, municipal, commercial, government, educational, and health care sectors. He is a member of the Consulting-Specifying Engineer editorial advisory board.