HPE Software is now Micro Focus
HPE Software is now Micro Focus
IT Operations Management (ITOM)
cancel

How to tame the HP Operations Manager message storms! – Part 1

How to tame the HP Operations Manager message storms! – Part 1

GirishMatti

Co-written by Tobias Mauch, a very senior and much respected engineer on the HP OM team.

 

Generally when a storm hits, you simply have to weather it and hope it does not inflict damages to your property. In the context of HP Operations Manager, these storms often consist of a huge number of messages or events that hit Operations Manager (HP OM) in a short period of time. The source of these messages or events is the HP Operations agent which is part of the infrastructure monitoring software. In many cases, these storms are trigged by events which are reporting the same failure. 

 

Any customer with a large installation of agents has potentially faced message storms/floods. As you know, the cost of handling and weathering such floods in terms of time and effort is quite costly.

 

Here are three easy methods to detect and prevent these storms. The first two approaches work on the HP OM server and the last one is provided by the HP Operations agent.

 

  1. Event Correlation Services (ECS) based message storm detection
  2. HP OMU 9.20 Event Storm Filter
  3. HP Operations agent Message Storm Suppression

 

In this blog, I will introduce the first approach, in the next two blog posts I will explain more about the other options.

 

 

Event Correlation Services based message storm detection.

 

In this method, Event Correlation Services (ECS) circuits are used to prevent message storms (either message based or policy-based). This approach has been around the longest.

 

Message storm detection/suppression is done on the management server by an ECS policy. You will need to enable output of all messages to the MSI in Divert mode for this and you will need to assign the ECS policy to the management server itself. The configuration, including defining the rate of incoming events and the interval, is performed by changing lines in the ECS fact store file for the ECS policy.

 

 Message flow scenarios:

 

Figure A : Message flow when suppression is enabled.

 

Possible message flows:

• Normal flow 1 -> 2 -> 3

• Flow when detecting a message storm 1 -> 2 -> 4 -> 5 -> 6 -> 7

• Flow after a message storm 1 -> 2 -> 3 & 3 -> 8 -> 9

 

   

  

  

 

 

 

  

 

 

Figure A

 

Figure B: Message flow when suppression is enabled.


  Possible message flows:

• Normal flow 1 -> 2 -> 3

• Flow when detecting a message storm 1 -> 2 -> 4 -> 5 -> 6 -> 7 & 2 -> 10

• Flow after a message storm 1 -> 2 -> 3 & 3 -> 7 -> 8

 

In addition to the steps described for ‘‘Suppression enabled’’, step 10 is performed where messages are sent to the message browser even when a message storm has been detected.

 

  

 

 

 

 

Figure B

  

You can configure the circuit so that it does not send the messages that are received by the management server to the message browser until the message storm is stopped. (Note that for the policy-based message storm: it is also possible to create exceptions, so some policies, nodes, or combinations of both are never disabled.)

 

 

There are two ECS circuits to choose from:

 

a) MsgStorm_Dectect : ECS policy will suppress messages if the number of messages from a particular node crosses the configured limit.

By default, the ECS policy will create an automatic action that will stop the agent on the affected managed node—but you can configure the action to do nothing.

 

b) PolicyStorm_Dectect : ECS policy will suppress messages if the number of messages from a particular policy on a managed node crosses the configured limit.

By default, this ECS policy will create an automatic action that will disable the affected policy on the managed node—but you can configure the action to do nothing.

 

For more information on this method, read the Message-Storm Detection White Paper here.

 

For more information on the ECS itself, you can find more information here:

 

For more information on how HP Operations Manager can help you with infrastructure monitoring visit with the product home page here.

 

 

  • operations bridge
About the Author

GirishMatti

Comments
N/A

Every situation definitely varies and one has to really assess the issue first before choosing  a preferred method that would potentially taem HP Operations messages. These are very easy-to-follow methods, however. Thank you for sharing!

Honored Contributor.

Agree with you, in our next posts we will point out two more methods for solving this problem.

Thanks for your feedback.