Close Menu

By David Felin, Missouri Enterprise Project Manager

“Reliability Centered Maintenance: a process used to determine what must be done to ensure that any physical asset continues to do what its users wanted it to do in its present operating context.” – John Moubray

In the 1960s, the failure rate among first generation jet aircraft was considered unacceptable. Two engineers from United Airlines, Stanley Nowlan and Howard Heap began researching the failure causes in the air travel industry. That research lead to reliability centered maintenance (RCM). RCM was first described in a 1978 Nolan and Heap report for United Airlines. Their report began as follows,

“This volume provides the first discussion of Reliability Centered Maintenance as a logical discipline for the development of scheduled maintenance programs. The objective of such programs is to realize the inherent reliability capabilities of the equipment for which they are designed, and to do so at minimum cost. Each scheduled maintenance task in an RCM program is generated for an identifiable and explicit reason. The consequences of each failure possibility are evaluated, and the failures are then classified according to the severity of their consequences. Then for all significant items those whose failure involves operating safety or has major economic consequences proposed tasks are evaluated according to specific criteria of applicability and effectiveness. The resulting scheduled maintenance program thus includes all the tasks necessary to protect safety and operating reliability, and only the tasks that will accomplish this objective.”

Stan Nowlan continued his research and in 1983 began to collaborate with John Moubray to adapt RCM to general industry. This gave rise to RCM2 (as Moubray called the general version), and John Moubray’s book Reliability-Centered Maintenance. Moubray formed a company called the Aladon Network to teach the techniques of RCM. Moubray realized the benefits of RCM by adapting the methodologies of the air travel industry into universal techniques to establish maintenance strategies wherever any physical asset is required to function reliably.

Reliability centered maintenance is used to establish safe minimum levels of maintenance for those assets. RCM can be used to create a cost-effective overall maintenance strategy to address dominant causes of equipment failure and should be a part of any risk management program. Reliability Centered Maintenance defines a complete maintenance model that captures run to failure, preventive maintenance, and predictive maintenance modes. RCM emphasizes the use of Predictive Maintenance (PdM) techniques in addition to traditional preventive measures but RCM should not be thought of only as a predictive maintenance strategy. Rather, the maintenance method selected is based on an examination of an assets risk of failure and the possible effects of such a failure. Successful implementation of RCM leads to increases in reliability and machine uptime.

The first step in RCM is identifying the machines that must be included within a routine maintenance program. Methods such as Pareto analysis of equipment downtime, reliability assessments or other relevant metrics may be used. The goal is to focus RCM resources on equipment that will provide the maximum benefit to the organization’s operational priorities.

Once the machines have been selected, their dominant failure modes and causes and the consequences of those failures must be determined through Failure Mode, Effects, and Criticality Analysis (FMECA). Levels of criticality are assigned to the consequences of failure. Some non-critical functions may be left to run to failure, while others might qualify for preventive or predictive maintenance. For the most critical assets or functions, those that should be kept operational at all costs, it may become necessary to keep stores of spare parts on hand.

The RCM methodology is defined by the technical standard SAE JA1011, Evaluation Criteria for RCM Processes, which sets out the minimum criteria that a maintenance program must meet before it can be considered an RCM program.

The process starts with the seven questions below, worked through in the order that they are listed:

  1. What is the item supposed to do and its associated performance standards?
  2. In what ways can it fail to provide the required functions?
  3. What are the events that cause each failure?
  4. What happens when each failure occurs?
  5. In what way does each failure matter?
  6. What systematic task can be performed proactively to prevent, or to diminish to a satisfactory degree, the consequences of the failure?
  7. What must be done if a suitable preventive task cannot be found?

The second part of the analysis is to rank the potential causes of failure and determine the appropriate maintenance tasks by utilizing the FMECA method. The decision of which strategy to employ for each potential failure mode may be based on experience, a pre-defined rubric connected to the failure effects categorization, cost comparisons, or some other combination of factors.

When safety is not an issue, another method is to compare normalized cost values for the available strategies and select the maintenance task that provides the desired level of availability for the minimum cost. For example, if the cost per uptime of allowing a machine to run to failure is less than the cost per uptime of performing a scheduled repair/replacement, and the run to failure approach provides an acceptable level of equipment availability, then it may be determined that no scheduled maintenance tasks are required for the equipment.

Similarly, for failures caused by non-predictable acts it may be determined that no action is required provided the risk, as determined by a combination of the severity and frequency ratings, is minimal or acceptable. In other words, sometimes doing nothing is acceptable, but you have to be able to make that determination. Conversely, when the risk of failure is very high, RCM requires the user to implement some action that will reduce the risk to an acceptable level including more advanced techniques such as preventive or predictive maintenance.

Maintenance tasks are then finalized and a schedule of maintenance is prepared and implemented. The result is a maintenance program that focuses potentially scarce resources on those assets that would cause the most disruption if they were to fail.