What I would like to do is get "immediate" notification of service level violations. For example, we have a CI with an SLA of 99% availability. Uptime is based on all of the incidents where impact is not "None". If a new incident or service call is opened with this CI and total "down time" is > 1%, then we get a popup (i.e. banner) telling us this.
The part with the banner based on the value in a specific field is easy. The problem is accumulating the downtime and storing it somewhere accessible.
I looked at the Metric tab of the SLA form. This appears to be the right place. Am I on the right track?
The field "minimum availability" is supposed to be a percent, is this e.g. "99%" or just "99"?
The online help says "In the Failure Impact Codes block, select the impact level that constitutes service failure in the Service Call field." So far, so good. Another place says "Only events at or above the specified threshold impact level are included in a performance calculation."
So, I look in the admin console->Data->Codes->General->Impact. These have six entries with an ordering 10-60. 10 is "None", so if I set "Failure Impact Codes" to 20, then everything >=20 will register as a "not available time". Is that correct?
At the bottom of the metric page there is the "Recurrence". If I set it to daily at 00:00, then everyday at midnight the avialability for the *service* will be recalculated. Each CI can be related to 0 or more services. So far, so good.
When I open a new Incident and assign a CI to it, I want e.g. a banner to tell me that the calculated availability is < 99%. I already have a banner when the maintenance contract for the CI has expired, so that aspect is pretty clear. However, I am at a loss at how to get the availability into the picture.
Once I figure out how to determine the current availability from the Incident/Call, I set the priority accordingly. For example, if availability < 99%, then the Incident/Call is automatically prio high.
Is there any way to have the calculation of availability done dynamically, i.e. each time an Incident/Call is opened/saved? Is there any way to force the recalculation of all metrics? For example, if the boss wants the calculation more than once a day, so we do it manually several times during the day.
Note that in our case there typically a 1:1 relationship between the CI and service. When the database server goes down the database service is obviously not acessible. However, there are cases where there is not a 1:1 relationship. For example, when one of 4 web servers goes down. The web "service" is still accessible and there is no change to its available.