This article is written with inputs from my colleague Dave Trout.
What's IT Operations Analytics (ITOA) anyway?
According to Forrester, this is "The use of mathematical algorithms and other innovations to extract meaningful information from the sea of raw data collected by management and monitoring technologies."
You may have seas of data with a wealth of information in them, but somebody's got to do the leg work of sifting through it all to derive meaningful analysis to influence business outcomes - that somebody is going to be greatly helped by the new breed of tools popularly known as IT operations analytics tools.
What's different about ITOA, as opposed to today's monitoring and existing-day IT operations tools?
While IT operations (traditional) tools have themselves undergone several revolutionary changes over the last few decades in order to keep up with growing demands of modern IT, these tools are not geared to handle new use cases such as searching and data-mining - needed for the up and coming problems of virtualization and cloud, or the volume of data - be it fault entries from log files, performance metrics, config changes or transactional information.
To their credit, the classic IT Ops tools have taken the form that their users and administrators have asked for - so you find these tools adhering more or less to the 'management by exception' model - they alert only if they find a problem driven by thresholds, or log patterns or signatures of known problems. Keyword searching and data-mining were never considered pure IT operations kind of work.
Now we need tools that can actually mine through lots of disparate data, and make meaningful analysis from the data to solve the IT problem. The data itself can be structured like system and applicaton logs, database rows, files with columns (delimiters) and rows, and the like, or the data could be unstructured like survey responses to open-ended questions, videos showing behaviour and sentiments and the like.
The key to such exercises that involve mining through the data is to arrive at the solution to the problem without getting stymied in the problem definition itself. This is one of the things with ITOA - 'finding' the problem (whereas, with IT Ops tools, the problem to be alerted on, is usually a pre-defined one). That in itself requires specialized tools and skill to work with the tool, to discern and analyse results. And the solution too, requires specialized tools and skills.
These tools are primarily the operations analytics tools - till date this has been the extended responsibility of Business Intelligence Reporting teams/tools. However BI and reporting tools have also stayed away from all the data types and formats that we are seeing appear today or even attempt to do the sort of data aggregation, correlation and other munging that these analytics tools attempt to do.
Is a lot of data really 'Big' Data?
Before we jump headlong into the discussion of how big-data might help solve some problems for us, a word of caution. Your IT might be spewing lots of data but that is not necessarily classifiable as the 'big' data that we are talking about here. Big data is well organized and it is collected and accessible in a centralized manner.
Even if the data and its processing is split across multiple elements in a cluster or grid, it is important to have the idea of wholeness in the way the data is accessed. That's an important quality for big data.
A lot of dispersed data is NOT big data.
So what should you do with Operational Analytics tool more than what you would attempt to do with the current-day business service/operations monitoring tools?
Ok coming to the topic of this blog, here are 4 things that you could do with the operations analytics toolset that make you realize the end-goal, reduce IT costs while keeping operations up and running and performing well.
Don't leave any stone unturned - collect all data - you never know how you could correlate the energy spend with the fact that several servers have not had power management turned on. Think of how any source even disaster warnings might be useful for your dashboard and analytics. Some consider twitter trends as a good source for decision making -like in marketing, for instance.
Look for patterns of occurences, anomalies - don't do just threshold based monitoring, look for anomalies. Try to use visual correlation. Find that recurring JVM heap size increase occuring only at specific juncture of time, every month then go through the motions of triage and fix.
Correlate data across the various spheres of your IT - break down departmental silo's. With virtualization coming into play, no longer is it going to be a network problem or storage problem or just the app problem. There's a good chance the problem is a combination of multiple things and showing multiple different symptoms at the same time. The data will give out the clear indication of the problem.
Predict, hypothesize, compare synthesize the data streams into a single plane and be able to replay occurrences. If you are unable to solve the problem just by hypothesis, you can do a TiVo-style 'replay the situation' and that can lead you closer to the problem in a reconstructed manner. Something detectives solving mystery crimes have done in movies - anybody remember Minority Report..?
If the new norm is "The devil is within and between the data sets", IT operations can adopt big data analytical approaches to improve their style of working and arrive at prior unknown problems - the gnawing issues in the underbelly of cross-team ownership that nobody notices or pays heed to, but can be huge differentiators to operational prowess. The power of these tools is multiplied by the capability to infer across multiple data types and over time to exploit the richness of operations context when issues occurred.
The popular joke doing the rounds - thanks mainly to the hype around big data - "Big data is the solution waiting for a problem".
Don't fall for the hype. We need to remember that 'Nothing comes for free'. Operational analytics requires greater skills and data set/data analytics mastery. Triage and prediction using data analytics, can be done best by people who are trained in the area.
Ramkumar Devanathan (twitter: @rdevanathan) is Product Manager for HPE Cloud Optimizer (formerly vPV). He was previously a member of the IOM-Customer Assist Team (CAT) providing technical assistance to HP Software pre-sales and support teams with Operations Management products including vPV, SHO, VISPI. He has experience of more than 14 years in this product line, working in various roles ranging from developer to product architect.