IT Operations Management (ITOM)

Using Operations Analytics to solve IT problems

Using Operations Analytics to solve IT problems


About a year ago, while on a customer visit, we were discussing IT issues and solutions and the following story came up.

The customer told us that one of his primary applications began running slowly a month earlier.  They received some complaints from users and also saw the decrease in performance numbers in the monitoring tools. The slowdown was inconsistent. For a minute response time was fine, and then a minute later it took forever… and it continued like this.  The pace also kept changing at night, when there are far fewer users. The application owner knew there were no recent updates or patches for this application and none of the coming events seemed to be relevant.

After a couple of days (and many hours of investigation) they accidently found it was a server in debug mode that caused all this trouble.

What would have happened if they had Operations Analytics? What could they have done to shorten the time to resolution? What could they have done to reduce the number of hours invested in solving the issue?  Well… a lot!

With HP Operations Analytics they could have viewed a dashboard for this problematic application (you can prepare one for each application up front, or ad-hoc as you need it). Operations Analytics is collecting the data all the time, you can view it and use it whenever you need it.

So when the problem was reported, they could have simply opened the dashboard. By using the dashboard they would have easily seen the rises in response time; as they saw it in their own monitoring tools. But in Operations Analytics they don’t only see response time; they can see availability changes, server metrics, event counts, log messages and more.

But that’s not all. By using the time slider they can easily focus on the time when response time started increasing:

 1 slider.png



The time slider affects all the dashboard panes. This allows the user to look for changes in other metrics and log messages that happened at the same time.

In our case, they would have found higher disk IO for one of the application servers and that log message rates went up at about the same time. The playback feature can help pinpoint the exact time when the issue started:

2 play.png


They can then select the time window when the issue started:

 3 small slider.png


For the selected time window, they can now review the log messages that were written. If there are any issues, there is a good chance you can find one or more log messages that explain the root cause. The time-based correlation improves your chances of finding these relevant messages.

Looking at the log messages it is immediately clear that there are more than a few messages with the word “Debug” on the same server with high disk IO. Transactions are using this server inconsistently and therefor the performance was intermittent.  The cause is now clear and it took minutes instead of days to figure it out.


4 page.png 

Metrics and Log messages for the application


5 page.png

Metrics and Log messages – Focus on the start time


6 page.png

Log messages (with DEBUG) – Focus on the start time


HP Operations Analytics speeds up the time to resolve business issues with a single pane of glass view. It presents application metrics, system metrics and log messages in one dashboard with a time-based focus, letting you drill down from a performance issue to the logs causing it.

To learn more about Operations Analytics visit us at



  • operational intelligence
0 Kudos
About the Author


Architect and User Experience expert with more than 10 years of experience in designing complex applications for all platforms. Currently in Operations Analytics - Big data and Analytics for IT organisations. Follow me on twitter @nuritps


This is classic IT Operations issue and as stated Analystics plays important role. What I would add is if it had business service impact model and change events feeds (When system/process/application was changed to  debug options), it would reduced the time further or even detected the problem when first abonormality was detected after the change. In IT most of the problems can be directly correlated to change. Therefore in any IT Operations Analytics, collection and association of change event to the issue is very important. There are BSM solution, which will do that.

New Member.

In any on going operations, Analytics plays a vital role in terms of how to map the collective data into useful information like this. BSM has demonstrated on how to detect issues using performance metrics data, this is what should be sold internally as well for our partner data which are at migration status right now. IT Operation Analytics plays a big role keeping things in order when everything goes LIVE.