By Nimish Shelat, Product Marketing Manager, HP Automation and Cloud Management
Inside many IT organizations, incident resolution is often a surprisingly manual and complex process. Even when an organization implements event consoles like HP Operations Manager i (OMi) to compile events across multiple domains and weed out irrelevant or duplicate data, Tier 1 and Tier 2 operations still spend much of their workdays responding to alarms and putting out fires.
But what if most of those firefighting exercises can be eliminated with IT Process Automation? Let’s take a look at how incident remediation works when HP OM operates in concert with IT Process Automation and IT Process Orchestration.
Let’s say that your IT environment experiences 68 million raw events per day (as one HP customer did). HP OM will automate the collection, correlation and deduplication of these events, prioritizing them based on their business impact and then applying automatic-actions to fix common problems. This is an excellent start—as the HP customer found out, it can slash the number of alerts you need to address down to 5,000.
However, resolving 5,000 alerts can still add up. Here’s why: When the OM enterprise console presents an alert to a Tier 1 Operations team, they manually turn to reference documentation such as runbooks, knowledge bases, or their own tribal knowledge (or maybe just a note tacked up on their cubicle wall—don’t kid yourself, it happens).
Fig. 1: How manual incident remediation processes work with HP Operations Management.
But what if first responders can’t resolve the event? Then Tier 1 must escalate to Tier 2 subject matter experts for manual troubleshooting, triage and (ideally) repair (Figure 1, above). Even then, some alerts will not get resolved, at which point Tier 2 administrators create an incident that is routed to an Infrastructure or Applications team to investigate further.
Clearly this can be a long, manual process of investigations, trial-and-error fixes and hand-offs by one or several IT personnel.
How OM and Operations Orchestration fully automate incident resolution
Operations Orchestration (OO) can replace many of the most repetitive processes that Tier 1 and Tier 2 administrators use for investigation and repair (Figure 2).
Fig. 2: How process automation remediates incidents with HP Operations Management and HP Operations Orchestration
When OM registers an event, it will use policies with criteria you set to trigger OO automated processes for incident resolution. Depending on the event and the policies, OO launches step-by-step logical flows for diagnosis and self-healing repair, delivering acknowledge/annotate alert messages with detailed information that can be reviewed by operators (Figure 3). OO records all flow execution activity for auditing and reporting, and when necessary will automatically create enriched incident tickets to the Service Desk.
Fig. 3: Example of an HP Operations Orchestration flow
Operator-Assisted Incident Resolution
One variation to this fully automated model is to incorporate operator assistance. In this scenario, the OM event alert goes to Tier 1 Operations, which may choose to launch “guided” HP OO flows from the enterprise console menu and make decisions interactively.
Of course, not every event will be resolved through OO incident remediation flows, but they can address the vast majority of them in a consistent, standardized way. For example, the HP customer I mentioned above was able to reduce it to a much more manageable 1,500 alerts. Integrating OM and OO allows Tier 1 and Tier 2 personnel to focus their efforts.
Experience HP Operations Orchestration for free
The new HP Operations Orchestration Community Edition is a free download of the OO platform with out-of-the-box content packs for automating incident remediation. Designed for easy self-installation, you will be able to begin experiencing within two hours the power of IT process automation and IT operations orchestration.