Guest post by By Shimrit Yacobi, Engineer, HP IT Operations
This article gives a use case scenario of the OLI solution for collecting and storing all log data to support fast and efficient real-time incident management from an SME point of view.
HP Operations Log Intelligence (OLI) is designed to offer Log Management capabilities to IT Operations teams. It collects and analyzes log data from any log generating source.
OLI exploits high speed, high compression technology, to store log entries for years, yet provide fast returns on searches launched by IT Operations team and Subject Matter Experts (SME). With OLI, organizations no longer need to be constrained by their ability to store and analyse data due to technology or cost.
OLI dashboards allow IT support staff at all levels to investigate log data according to the best practices defined by your operator. Dashboards are designed to coordinate cross-team investigations and enhance collaboration. OLI includes OOTB dashboards tailored to suit IT Operation needs, as well as offer the ability to easily design and create custom dashboards.
Bill is a SME monitoring an increasing number of log files in his virtualized environment. He has been experiencing performance issues with Windows and needs to find the root cause of the problem. Bill has a Windows application running on top of an Apache web service. Having used OLI to collect Apache and Windows log data, Bill has the data readily available to perform analysis and diagnose the root cause of the issues using OLI application.
Bill opens the OLI web application and begins on the summary page, which gives him a view of all the system components being monitored:
Bill scans the summary page for suspicious behaviour that might be related to the Windows issues he encountered. He notices that the apache_access_file agent count is higher than usual.
He clicks on the apache_access_file to drill down and is directed to the analyze page.
The drill down operation automatically created a query for the analyze page. Bill sets the relevant time range to indicate the time his issues occurred. This results in an aggregate bar chart displaying the number of requests for every day within the selected time range.
As an experienced SME, Bill knows that having 2000 requests sent from one host within one hour might indicate a cause of his performance issues.
He decides to drill deeper into the problem and opens the Apache Web Server dashboard, a preconfigured customized dashboard.
The dashboard allows him to view IT logs, machine data, and integrations of different aggregation and charts over time. Bill adjusts the date range to see the analysis of the day in question all on one screen.
Bill investigates the decoded data from log files on the dashboard, and reviews the server behaviour for the last day. He looks for any suspicious error messages with a high severity, but none appeared in the logs according to the “Apache Error Count by Severity” pane.
Bill looks at the aggregate count of requests and notices a few peaks of 1000-2000 requests. He looks on the chart and sees a reoccurring anomaly of stress inputs on the Apache server within the same time range he encountered the performance issue.
Bill realizes there was an Apache script running every few hours that caused the peaks that eventually resulted in the Windows performance issues. After removing the script, the anomalies stopped and the corresponding Windows performance issues stopped as well. Using the OLI application, Bill determined that the Apache script was the root cause of the performance issues he encountered. Bill can now share his dashboards with IT staff at all levels in his organization to pinpoint and prevent similar performance issues in the future.