**NOTE - To view the below screenshots more clearly, click on "Article Options" at the top of this article and select "Printer Friendly Page."**
With our announcement of Service Health Analyzer (SHA) on November 29th, we plan to do a series of posts related to this new predictive analytics solution. Over the next few weeks, we will be posting blog write-ups from a number of our SHA R&D team.
This first SHA post will discuss a technique that can drive a baseline calculation sooner. In the first few hours after installation, the required configuration will enable you to see some anomalies, based on availability problems, and towards the end of the first 24 hours, anomalies will be based also on their normal behavior characteristics: the baseline sleeve.
Please note that the information in this blog post, “Installing SHA on 1 box -- Getting value within 2 days,” is focused on POCs and not on a large production deployment.
This post is written by Shahar Tal.
It is good to know, that our predictive analytics product, Service Health Analyzer (SHA), which can be thought of as a zero configuration and a zero maintenance product, also has a near zero extra hardware cost with a fast ROI. In fact, it is possible to start enjoying results from SHA within 24 hours of installing SHA in POC mode. In the first hours after installation, the required configuration will enable you to see some anomalies, based on availability problems, and towards the end of the first 24 hours, anomalies will be based also on their normal behavior characteristics: the baseline sleeve.
The First Day
This whole method is based on speeding up the statistical learning process performed by the baseline engine, by invoking it manually. Otherwise, it is only invoked automatically after about one week of operation. Invoking a manual baseline calculation creates a baseline sleeve using the available data (in our case one day worth of data). In such a case, there is no calculated seasonality (the repetitive behavioral pattern of metrics, like for example weekends) and the sleeve will be less accurate. After the automatic learning process reruns, we’ll have much more data, and the quality of the baseline will be much more robust, based on the law of large numbers.
The next step will be to define the business services or applications we would like to analyze. On these Configuration Items (CIs), SHA stores the metrics for the baseline learning. The storage is on a separate SHA schema, in the same concept of the Profile DB.
In order to see the events, you must create business CI models between infrastructure CIs and application CIs.
In fact, as long as there are also host nodes in the topology below the CIs, we can also get SiteScope, Diagnostics and even OM SPIs data.
After about 10 minutes, samples receive metric IDs and the storage of them begins. For now, the anomalies we might get, which come in the form of new notification (events/ alerts) or CIs with “bad” Performance Analytics KPIs in the Service Health dashboard, are based only on two consecutive availability anomalies (to ensure it is not temporal behavior). Still, there is an extra step to follow in order to receive the anomalies.
SHA by default issues the anomalies only on “true” anomalies, as small anomalies might not breach a significance test and will be ignored. This is one of the SHA components that reduce false alarms.
The anomaly event (once issued) contains the highlights of the anomaly, based on the combined topology of the breached metrics CIs.
* The event console *
The Second Day
After playing a little bit with the basic anomalies on the 1st day, it is now time to start getting real anomalies, based on baselines. In order to run the baseline engine, there is a pre-requisite offline process, which runs automatically every night and sums up the raw data. On top of its output, the baseline engine can run. By default, this baseline engine requires its own server, as the amount of data it analyzes is enormous and takes a while. There is however a way to force it to reside on the already installed BSM server. Note: This is the reason why we should not define all the CIs to be analyzed, in this single BSM+SHA box.
The process is asynchronous and after some time, depending on the amount of CIs in the CI Selection UI, the metrics will be enriched with their normal behavior values. From that point, the SHA run-time anomaly detection engine can alert the user, not only on availability problems, but also on abnormal behavior.
* SHA UI *
For further information refer to the BSM documentation on Using Service Health Analyzer and to the SHAbest practices – 1 box in 2 days (both attached).
Product Marketing Manager for HP Application Performance Management suite of software products. Before this role, I worked in the HP StorageWorks Division working as both a Product Marketing Manager overseeing enterprise hardware and software, as well as working as Business Development Manager for the Enterprise Services channel.