IT Operations Management (ITOM)
cancel
Showing results for 
Search instead for 
Did you mean: 

The return of how to tame the HP Operations Manager message storms

The return of how to tame the HP Operations Manager message storms

GirishMatti

Co-written by Tobias Mauch, a very senior and much respected engineer on the HP OM team.

 

Welcome back to the second part of this series on managing an event or message storms in the HP Operations Manager (HP OM) infrastructure monitoring environment.

 

To recap what was in the first post, we saw how by using ECS you can weather a storm, a HP OM message storm that is! If you missed it, you can read it here: How to tame the HP Operations Manager message storms! – Part 1.

 

The threat of message storms reminds me of one of my favorite movies, “The Terminator”. As I sat to write this blog, I realized how some of the famous quotes in the movie also relate to operations management.

 

Come with me if you want to live

 

Having read and understood the previous blog post, are you wondering if there is another way to tame the storms without having to learn and use the ECS component? Well, there are two ways to achieve that, one of which is listed below, read on.

 

HP OMU 9.20 Event Storm Filter (ESF)

 

In this post we will have a look at the HP OM  event storm filter mechanism which works on the HP Operations Manager server. (Just a note, this feature requires HP Operations Manager version 9.20 and above.)

 

This feature uses a Message Stream Interface (MSI) program for the message suppression. You will need to enable output of all messages to the MSI in Divert mode for this to work. It has a lot of configuration options including the ability to define one or more message attributes to filter on, define exceptions, etc.

 

To use this feature simply configure one or more rules (also called Gates) in the /etc/opt/OV/share/conf/OpC/mgmt_sv/esf/flood_gates.conf configuration file. This starts suppressing the message storm. You can also log suppressed messages or add them as annotation to the storm message. What’s more, you can also define an automatic or operator-initiated action along with the message!

 

The added benefit of this mechanism is that since an ECS policy is not used it has better performance compared to the ECS based message storm filter.

 

Table A: visual representation of the storm filter

 

Here is an example of an event storm filter gate rule:

Modify the /etc/opt/OV/share/conf/OpC/mgmt_sv/esf/flood_gates.conf configuration file and then enable the Event Storm Filter using:

GateName=All non-critical messages except self-monitoring and health check.

Customer=%

Node=!

Object=!

Application=!

MsgGroup=!

MsgType=%

MsgText=%

TT=%

Rate=10

Period=2

Annotate=0

Log=1

CreateTT=0

CloseMsg=1

CloseAction=none

ExcludeMsgGroup="OASelfMon"

ExcludeMsgGroup="HC"

ExcludeSeverity=Critical

This rule will suppress messages if there are more than ten messages (Rate) in a period of two Minutes (Period) that have the same combination of Node, Object, Application and Message group. This is expected with the exception of messages with severity Critical or message Group OASelfMon or message group HC. The suppressed messages will be logged.

 

Some explanations about the elements used in the gate config file:

- The flood_gates.conf file can have one or more gates (rules).

- For each gate you need to define some mandatory configuration file parameters and you can define some optional configuration file parameters. Each of them in a separate line.

- Some configuration file parameters define filter conditions for certain message fields (Node, Object, …).

  You can use following values:

  ! - Matches messages that use this parameter as a key. For example, if the Customer field is set to !, the Rate number of events must occur in a specified period of time (Period) on the nodes of a specified customer for a storm to be reported. The events that occur on the nodes of other customers are ignored.

  % - Matches messages regardless of the value in this field.

  <value>% - Matches messages if the parameter begins with a string value.

  <value> - Matches messages if <value> is completely matched.

- Some lines define conditions for the amount of messages that are considered a storm (Rate and Period)

- Some lines define actions that need to be taken if there is a storm (Annotate, Log, …)

- ESF allows to define customers to make this feature useful for service providers. The nodes that belong to customers are defined in the customer information file.

 

Teaser on the next post (last one) in this series:

 

How to tame the HP Operations Manager message storms – Returns again!

Suppress the message storms on the agent nodes itself, don’t let them reach the HP OM server. Use this simple configuration and tame the storm.  This is not all, you can simplify the configuration deployment itself!

 

Hold that thought, to quote the Terminator “I’ll be back!”

 

 

For more information, see the Event Storm Filter section in the HPOM Administrator's Reference Guide.

  • operations bridge
About the Author

GirishMatti

//Add this to "OnDomLoad" event