Guest post by Gary Brandt, Functional Architect, HP Enterprise Services IT
It’s always pretty intense inside a company the month or so before it hosts a major customer conference and product showcase. It’s no different at HP. Our HP Discover in Las Vegas each June is one of two annual marquee events (the other being HP Discover Barcelona in December), and everyone is working in overdrive to put our best face on and impress customers.
In other words, it’s really not a good time to experience a major network outage.
But on May 20 — just two and a half weeks before HP Discover — that’s just what happened.
Fortunately HP’s Global IT, which provides IT services to HP’s 300,000 employees, was able to use the new HP Operations Analytics tool we were evaluating to quickly identify key contributing factors that lead to a widespread network problem and get the company back to work.
It started as a product network outage affecting many employees at HP sites in the Americas and around the world. Soon we got a call from the HP-IT Global Support team asking what was up with HP Network Node Manager (NNM) — it was generating thousands of critical events in the dashboard, and they were overwhelmed by data. There was just too much noise.
When we checked it out, NNM was doing its job, indicating a transient event storm, with the same errors appearing across different regions.
We went into triage mode, scrambling to pull in experts from the global Support and Telecom teams to troubleshoot and find a way around the problem, so employees could stay productive.
Operations Analytics to the rescue
As it happened, I was in the midst of evaluating how HP Operations Analytics could integrate with existing metrics collection products across multiple silos. We were learning how Ops Analytics could pull in feeds to consolidate metrics and provide visual analytics of operational Big Data. As part of this evaluation, we were already pulling in some network data from production.
So we thought, “Let’s throw Ops Analytics at the problem and see what it can do.”
Also in the webinar, I’ll discuss how HP IT incorporates best operational practices to collect and analyze structured and unstructured data using big data analytics at enterprise scale. In addition to the above use case, we also share how HP IT has used Ops Analytics to:
Troubleshoot application outage from configuration change
Find root cause of problem related to Exchange performance
Identifying the root cause of our network outage demonstrated to me the potential of Ops Analytics. Without it, the Support and Global Telecom teams would have spent more days searching through logs and manually correlating data to narrow down the potential causes. Instead, Ops Analytics quickly guided the teams toward virtually hidden information critical to restoring service.
I hope you’ll join me in the webinar to hear more!
Click here to learn how Operations Analytics helps you combine all your operational data, gathering metrics, events, topology, and log file data from all your IT systems into a comprehensive view, so you can make use of your existing investments.