Events, ETIs, HIs, KPIs, CI Status, Status Propagation – do these terms sound overwhelming and confusing? I guess not. However, how these align together to finally reflect the impact of an event in the form of CI health status, could be a bit complicated to understand. Queries like “I closed the event but my related CI status still shows critical?”, “I want to see the HI values on parent CIs?”, “I defined a new KPI but it does not show up in service health” – indicate the lack of required understanding.
In this blog, I make an effort to simplify this critical flow from “OMi events to CI status”.
Here are some basics about the CI status:
- CI status and its color are defined by the worst status of all KPIs which are applicable and included in status calculation of that CI. For each view in Service Health, you can configure which KPIs are used in CI status calculation; by default all KPIs are used.
- KPI is assigned and created on a CI by direct assignment (using KPI assignments) OR by propagation from child Cis.
- KPI is calculated by assigned HIs on the same CI (also called self HIs) and/or by KPIs on the child CIs (depending on the KPI assignment definition) or by other KPIs on the same CI
- KPI propagates from child to parent CI (this behavior can be changed using propagation rules)
- parent-child relationship is defined by the service health impact model which is based on impact relations between CI types (impact relations are created by impact triplets in RTSM)
What are ETIs and HIs?
These two terms cause a lot of confusion as HI is the term used in Service Health while OMi uses both - HI as well as ETI. HI and ETI are both defined on CI types but ETI is an OMi concept while HI is relevant in the Service Health context.
ETI - Event Type Indicator
ETI is an event attribute, used to generalize events coming from different data sources into OMi. Event with ETI can set an HI on the related CI.
HI - Health Indicator
HI expresses the precise, current state of a CI and can be set by events carrying ETI information or by metrics data. HI state is a persistent state which does not change with event state change. HI state changes when there is a new event or data sample OR it can be changed manually.
An ETI that contributes to the health of a CI is called an HI. Events can map to ETIs or HIs using the ‘ETIHint’ information sent in the events OR using the indicator mapping rules in OMi.
ETIs and HIs, both are used for TBEC (Topology based Event Correlation) BUT only HIs are used to set the KPIs and hence status of the CIs.
KPIs propagate from child to parent CIs, NOT HIs.
HI state does not change when corresponding event is closed.
The following picture depicts the hierarchical structure of Events, HIs and KPIs:
NOTE: HIs can be set by events or metrics, but in this post, we consider only event based HIs and NOT metrics based HIs.
Steps to follow for mapping custom events to CI status by creating new KPI and HI:
There are many HIs and KPIs already defined on various CI types as part of OMi content packs. You should check them out to see if they meet your needs before creating new ones.
Define an HI on the CIT of interest, for example – “Device Fan Status” on node CIT
2. Optionally, define mapping rules for this HI (in case no ETI hint is sent, mapping rule can map the incoming event to the HI; this is especially useful when you cannot modify the event source to add ETI hints)
3. Define a new KPI, lets call it “SystemEnvQuotient”
4. Define KPI assignment rule on the same CIT as used for HI definition i.e. ‘Node’ and configure it to be calculated by HIs and child KPIs and add newly created HI - ‘Device Fan Status’
5. Send an example event with CI hint as some test node and ETI hint as "Device_Fan_Status:Critical" Or send an event with severity ‘critical’ and message text as "fan speed is slow" (in this case, mapping rule will be used to set the HI)
When you create or modify an existing KPI assignment on a CIT, please remember that if a subtype of that CIT has a modified version of that assignment, then your changes will not be applicable to the subtypes of the CIT. For example, lets say on NODE CIT, there is a KPI assignment called “Node Mapping” and it assigns “System Availability” KPI. You edit it and add your new KPI with the expectation that any CI of type NODE or its subtype (like Computer, Windows etc) will get this new KPI. However, this may not happen if “Node Mapping” assignment has been changed on Computer CIT. In such a case, your update assignment will be applicable to CIs of type NODE only. To change this behavior, you will have to either create a new KPI assignment OR update the existing assignment on all relevant CITs.
Special KPIs based on OMi events lifecycle state:
“Unassigned Events” and “Unresolved Events” KPIs are special KPIs which are calculated based on active event count and the event severity. We will discuss details about these KPIs and their use cases in another blog…