Best practices for proper alarm management

Environment and Safety

B. STRAKER, JEM Advisors, Houston, Texas; T. HANSON, JEM Advisors, Mississippi; R. KENYON, JEM Advisors, Temecula, California; G. LADNIER, JEM Advisors, Daphne, Alabama; G. McLEOD, JEM Advisors, San Rafael, California; T. SAUVAIN, JEM Advisors, San Rafael, California

Process alarms are critical safeguards put into place on operating units to enable operators to avoid incidents that impact a company’s safety, reliability and bottom line. It is absolutely essential to any safe facility operation that alarms are properly managed and utilized to address real and consequential operational issues before they become incidents. 

How can alarm management performance be evaluated? For example, some top performing line operations managers request a snapshot of the alarm screen each morning with an explanation of the cause of any active alarms and what has been done to address the issues. Unfortunately, this can result in unintended consequences, as operators will take purposeful actions to reduce or eliminate alarms by deactivating them, thus giving managers a very small non-representative list of active alarms each daywith some having zero active alarmswhich does not reflect the actual state of operations. 

Some of these top performers are mistakenly seeing their desired “performance” achieved because their console operators put problem alarms into a “shelved,” “bypassed,” “inhibited” or other “named state” that removes them from an easily monitored condition and effectively removes them as safeguards for serious incidents. Some console operators do this to make it easier for them to operate without distractions. Most do not understand that the safety of their operations is at risk when they disable critical alarms, regardless of their reasoning. To combat this issue, most consoles have timers to prevent the alarm “bypass” from remaining active beyond the current shift but that allow it to be extended easily by the oncoming console operator by “selecting all” and authorizing the extension. This is an unacceptable process.   

Case study. The company senior vice president (SVP) of operations had cause to visit a floating production storage and offloading (FPSO) facility. The offshore installation manager (OIM) had prepared an agenda, but it was approaching 6 pm and the OIM had planned a dinner with the leadership team. However, after disembarking the helicopter and being taken through the health, safety and environmental (HSE) briefing, the SVP immediately donned his personal protection equipment (PPE) and requested a visit to the control room. There were some puzzled looks on the faces of the crew, but the SVP was clearly setting an example of good, strong leadership. Entering the control room, his first request was to see the inhibits and overrides register, where he randomly selected a few items and posed several questions as to why these inhibits had been in place for so long. This was a great demonstration of a senior executive who was well-versed in operations and, more importantly, operational risk.

Proper alarm management. Good leaders “walk the talk,” where they make clear their expectations to all stakeholdersespecially front-line supervisorsand make very clear the consequences in the event of non-compliance when managing risk.  

Monitoring and action by leadership and support organizations to address the root causes of excessive and nuisance alarms are required to ensure safe and incident-free operation. It is critical to have the alarms that are “inhibited” or “shelved” capture daily via metrics to allow management to understand the current liabilities and to reinforce the actions that impact alarm management (FIG. 1). Many companies have added a distributed control system (DCS) report each morning that addresses each point that has had its alarm status manually changed, called a point attribute report (PAR). This report, combined with the alarm screen shot report or similar indicator, is a better measure of proper alarm management. 

FIG. 1. How would you handle it? Source: Roman Tingle.

Proper alarm management that enables operators to avoid serious incidents includes an alarm management strategy comprising the following: 

  • A DCS engineer key that is properly controlled to inhibit critical alarms  
  • The installation of timers on non-critical equipment, but high-impact alarms that limit the time these alarms are inhibited 
  • A console operator and line management review of a performance and accountability report (PAR) by each shift, and a second-level review of alarm screen shots and PAR reports on a daily basis 
  • A process in place to review frequently inhibited alarms to remove or change alarm points or otherwise address the root cause(s) 
  • Hazard and operability analyses (HAZOPs) and alarm objective analyses on a regular and set frequency to reduce alarm redundancy leading to alarm overload, but also triggered by any incident investigation 
  • All new alarms undergo a rigorous alarm objective analysis (before final HAZOP) to ensure no over-alarming of new equipment. 

 

As part of an alarm management strategy, a recent review of a month’s alarm activities for a typical facilitywhere reported performance was good, with no issues around alarm avalanche or alarm overload for the console operatorsis shown in TABLE 1. 

Since these are averages for the day for 144 10-min periods, potential alarm overload issues may be hiddenthey may have occurred during a short period but are diluted by other periods of no alarms during the day. Reporting the number of 10-min periods with alarm overload conditions and providing the ability to examine the details of each may provide a more insightful report. Further, a detailed review into inhibited alarms should be undertaken to understand if these data are a realistic picture of the console alarms. 

Console 1 shows good performance until the last two days of the month: a major upset occurs on Day 30, with alarm rates well beyond what could be managed. A detailed review of the upset is warranted, with modifications needed and made to make the response manageable for future events. 

Consoles 2 and 3 indicated the best performance of the group and would be considered good performance, assuming that these are realistic reflections of actual alarms without inhibition. 

Console 4 showed issues mid-month for 8 d. Understanding what caused the elevated alarm conditions is warranted with corrections made to prevent recurrence. 

Consoles 5 and 6 required major alarm management reviews to correct their ongoing alarm overload performance. A steady diet of alarms every few minutes is not a reasonable workload for console operators and will prevent the safe and reliable unit performance that comes with steady-state operation. There is low probability that the advanced controllers are properly enabled with adequate freedom to keep the unit in stable and optimal operation. 

Today’s operations require a much more sophisticated evaluation of current alarm conditions with the limited view available for those other than the control operator. Some companies include a screen dedicated to critical alarms as part of their DCS screen distribution, which allows ready access for the console operator, as well as technical support and line management. 

Excellence in alarm management is truly a team effort. It requires competent console operators, a robust maintenance program, knowledgeable process control engineers, and leaders who require excellence in alarm management. It also requires appropriate metrics that monitor performance, combined with the appropriate reinforcement of the metrics that does not encourage inhibiting repeated alarms, but rather the investigation and resolution of the root causes of those frequent alarms. Process units with their advanced control system activated have the highest likelihood of excellent alarm management performance and safe, reliable and profitable operation of the unit. 

Artificial Intelligence (AI) is beginning to be applied to alarm management in different industries and will play an increasing role going forwardthe basic fundamentals described in this article will still be applicable. 

The authors’ company has a significant amount of hands-on experience and knowledge to support the implementation of effective alarm management programs. GP&LNG

ABOUT THE AUTHORS

BILL STRAKER has more than 40 yr of oil and gas experience, with an emphasis on maintenance, reliability and operational readiness (OR). He works with JEM Advisors and others.

TOM HANSON has more than 40 yr of oil and gas experience, with an emphasis on operations, maintenance, reliability and OR. He works with JEM Advisors and others.

REX KENYON has more than 50 yr of oil and gas experience, with an emphasis on maintenance, reliability and operations. He now works with JEM Advisors.

GARY LADNIER has more than 40 yr of oil and gas experience, with an emphasis on operations, systems completion and commissioning. He now works with JEM Advisors.

GAVIN McLeod has more than 40 yr of oil and gas experience, with an emphasis on systems completion and commissioning.

THAD SAUVAIN has more than 35 yr of oil and gas experience with emphasis on process engineering and design.

Related Articles

Comments

{{ error }}
{{ comment.comment.Name }} • {{ comment.timeAgo }}
{{ comment.comment.Text }}