There’s a high probability that at least one of those things will be the cause of a major IT disruption. According to a Forrester Research 2013 survey, one third of respondents declared a disaster in the past five years. The top five culprits are listed above. Now imagine what it would be like if you were out for 30-plus hours, like 1 in 5 survey respondents experienced.
Don’t be scared, be prepared
With the risks so high, IT organizations must implement and continuously test and update disaster recovery (DR) and business continuity (BC) procedures. But testing DR procedures is a time-consuming and resource-intensive task that involves multiple subject-matter experts (SMEs) from across IT. A typical test for a large organization can involve dozens of people on multiple conference calls for up to a full day.
Its little wonder why testing typically happens so infrequently. The Forrester survey revealed 39 percent of firms conduct a full test — a live or simulated failover of all infrastructure at a site — only once a year. In fact, DR procedures really should be tested every time a major change is implemented on an application.
Orchestrate your recovery
In many ways, IT process automation and orchestration is a perfect fit for DR procedure testing, reducing how many resources you expend and improving success rates.
Consider the characteristics of any disaster recovery or failover exercise:
Requires a number of tasks that need to be performed in a very specific sequence
Tasks often span a number of different IT domains — server, network, storage, and others
Tasks require a number of different SMEs, including network engineers, database administrators, server administrators, and others
Success depends on coordination and handoffs between these SMEs
IT process automation and orchestration makes all of this faster and easier. By creating workflows that tie together diverse tools, processes and domains, the risk of failure is significantly reduced. And because workflows capture and essentially document the process information, you also protect yourself from risks that key personnel or groups will be unavailable.
How OO workflows automate disaster recovery of an email system
Let’s look at an example of how HP Operations Orchestration drives efficiency and reduces errors by automating a number of repetitive and tedious tasks. Below is an HP OO workflow for automating the disaster recovery procedure for an email system:
Figure 1: Implementation of a disaster recovery process using HP OO
The HP OO workflow above can may be triggered when a change ticket declaring the DRP event is approved. Here are the steps it follows:
The DR event is declared (real or test)
Verify that the change requests in service desk systems (such as HP Service Manager) are approved
Verify that network is operational
Validate the health of the destination systems, including server and storage
Verify that the configuration of the destination system is same as source system, including database (SQL Server), application servers (Exchange) and Web servers.
Clone the destination server, if source and destination are not same
Disable monitoring, clustering on the primary systems
Perform failover tasks:
Disconnect users and disable new connections
Open connections into destination systems
Reroute Domain Name Systems (DNSs) to point to destination systems
Deactivate primary systems
Validate the availability of service for the new system
Update change request ticket in service desk system
Update configuration management database (CMDB) with current status, view reports to verify that failover completed successfully
Re-enable monitoring and clustering
Notify users and stakeholders
Declare DR event complete
There are also two automated sub-workflows built in at Step 6, for cloning the destination server, and Step 8, for the failover from source to destination:
Fig. 2: Sub-workflow for cloning destination server
Fig. 3: Sub-workflow for failover from source to destination
To manually conduct such a complex disaster recovery procedure would clearly require a significant amount of time and resources — and chances are that your organization would not get around to testing its effectiveness as often as it should.
Automating and orchestrating a large number of disaster-recovery tasks drives down the costs of performing critical disaster-recovery planning. Furthermore, the procedures are more reliable and ready for an actual recovery event. Institutionalizing disaster-recovery procedures in orchestration workflows also helps to communicate and document the procedure, and reduces how much you must depend on specific individuals or groups.
Experience HP Operations Orchestration NOW!
The new HP Operations Orchestration Community Edition is a free download of the platform with out-of-the-box content packs for automating incident remediation. Designed for easy self-installation, you will be able to begin experiencing within two hours the power of IT process automation and IT operations orchestration.
Download HP Operations Orchestration Community Edition for FREE
Nimish Shelat is currently focused on Datacenter Automation and IT Process Automation solutions. Shelat strives to help customers, traditional IT and Cloud based IT, transform to Service Centric model.
The scope of these solutions spans across server, network, database and middleware infrastructure. The solutions are optimized for tasks like provisioning, patching, compliance, remediation and processes like Self-healing Incidence Remediation and Rapid Service Fulfilment, Change Management and Disaster Recovery.
Shelat has 23 years of experience in IT, 20 of these have been at HP spanning across networking, printing , storage and enterprise software businesses. Prior to his current role as a Manager of Product Marketing and Technical Marketing, Shelat has held positions as Software Sales Specialist, Product Manager, Business Strategist, Project Manager and Programmer Analyst.
Shelat has a B.S in Computer Science. He has earned his MBA from University of California, Davis with a focus on Marketing and Finance.