IT Operations Management (ITOM)

How I get more out of monitoring: Why dynamic and predictive models identify faults quickly

How I get more out of monitoring: Why dynamic and predictive models identify faults quickly


Editor’s note: This is a guest post by Eli Eyal, Operational Support Services Manager at Playtech, the world’s largest supplier of online gaming and sports betting software and an HP Software customer.  


IT environments experience a very high rate of change today, much more than they ever have before. Rapid change requires rapid adaptation, and that is true of monitoring systems as well. In my role as the Operational Support Services (OSS) manager at Playtech, I need to make smart decisions about how to keep pace. I see myself as both a vendor and a customer—the investments I make in technology platforms and tools must provide value to the service that I offer the business.  


Monitoring is the eyes and ears of the business and a top priority. We need to both maintain the current monitoring capabilities that exist in Playtech’s IT environment, as well as find ways to add more monitors to the additional services and applications that are introduced almost daily. We always need more from our monitoring system: more information with greater automation that will allow us to detect and remediate faults more quickly.


Dynamic monitoring for dynamic IT


Our analysis revealed that our monitoring system needed to adapt more to changing conditions, and that required us to focus on two key areas: dynamic monitoring and predictions.


Creating such an adaptive monitoring system using the standard agent or agentless monitoring tools is very hard. Monitoring is usually performed on static sites, where we check whether the application or a server is running correctly or at fault, usually based on traffic-light status indicators. But in our constantly changing IT environment, standard monitoring methods that report on static objects like server CPU usage or memory consumption of our Java application will miss many things that could help us prevent the next downtime.


Monitor your business baseline


Dynamic monitoring and predictions are both achieved by looking a bit beyond our static IT and instead considering what it is used for — our business. If you examine how your IT environment is used, you will probably find as we did that there is a seasonality pattern. This usage pattern can inform remarkable monitoring capabilities that you cannot do with standard static monitoring.


Let’s take a simple metric like business usage, which is the number of users that are using the service. The performance of your IT application can be measured not only by whether it is up or down but also by how many users are using it. Depending on the service that your application is providing, you will find that it’s being used more at some points in time and less at others — a pattern that repeats over the course of a week, month or year. For example, you might find that usage is at its lowest post every Saturday, but on Monday it’s at its highest.


By harnessing this information you can establish a baseline of known behavior and monitor subsequent behavior for anomalies. These anomalies will indicate problems in places you have never thought you could even monitor before — your ISP provider, your external link to other vendors and even changes in your own business service.


In my case, monitoring from different angles gives me the ability to understand whether there is a problem and determine its priority. It also helps me identify a root cause twice as quickly, especially when it’s not in my environment. For example, my business relies on an ISP to connect users to our systems. When there is a problem with our ISP, standard monitoring provides no indication of it, because it monitors the environment from the inside. But with the baseline monitoring in place, I can know that there is a drop in my business activity — obviously a high-priority issue. Since there are no additional infrastructure alerts, I can start to search for the root cause outside of my environment, including checking external dependencies such as my ISP.


Prediction is the name of the game


The benefits of having your dynamic usage KPIs monitored are enormous. Not only do you no longer have tune any static thresholds, which reduces your maintenance to almost nothing, but the baseline business activity statistics let you know what should happen tomorrow, next week and next month. You can also see the real-time behavior in front of you, and compare it your predicted metrics. If the prediction fails due to a fault in any aspect, you can move as quickly as possible to find the fault.


In IT, knowledge is power, and being able to monitor things that I never could have before and provide more information, makes monitoring a more powerful tool for my business.


Learn more at HP Discover


Come hear more at my Business Service Management session in Las Vegas, “The five-step journey to predictive operations (BB3076)”. I will be on a panel of HP Software experts who will explore how you can evolve IT operations from a reactive to a predictive management state.


See also my webinar on this topic here, and get more information on how HP provides PlayTech with Predictive Analytics for IT operations, at


Check out the complete HP Discover Las Vegas 2014 session catalog



  • operations bridge
0 Kudos
About the Author


This account is for guest bloggers. The blog post will identify the blogger.


The story how PlayTech have fed business metrics (and others) to the predictive analytics engine. -- hp sha -- is really interesting. The prospect of advanced warning of issues must surely be a tremendous driver for people to explore. The visibility PlayTech gained in their implementation amazed customers so much, they want it too. Talk about value.