Disaster Recovery dan Change Management
Disaster Recovery and Change Management: The Right WayWritten by Alex Belyarchik, CBCP, CORM, ITILJune 23, 2014
Organizational resiliency has always been about building a sustainable program and maintaining it via tests and exercises. However, with growing IT and the need for changes, disaster recovery (DR) is becoming a missing link which is forgotten time and time again. Based on my personal experience in consulting, organizational culture plays a very important role in allowing many critical changes slip through the gates of disaster recovery control. Therefore, we are faced with a critical need to track and monitor all of the corporate changes in order to keep the program up to date.
One of the critical aspects of disaster recovery has always been dealing with day-to-day plan updates. Has your company decommissioned hardware without changing the disaster recovery plans to reflect these changes? Then this article is for you. I will try to share my knowledge in getting the organization resiliency to the next level via setting strict controls and gates to ensure that changes are controlled and reflected in plan updates.
Let’s imagine a simple scenario which is typical to most of the companies with which I work. A request for a change comes in, gets approved, and is implemented. Well, everybody worked so hard in building out the infrastructure footprint. At the same time, how could it be maintained and not left out of the equation? The answer relies heavily on senior management and obtaining their support and buy-in into the DR program. Therefore, the first step one should consider is talking to senior management via presentations or meetings and supporting documentation with real life examples of companies failing to survive during the crisis due to not having their DR footprint ready to go when needed.
The second step is inserting one’s corporate disaster recovery team into the project management office, or a change management process. The goal is to ensure that when a small and medium enterprise (SME) submits a request for change (RFC), it must not only go through the normal approval process but also get approved by the disaster recovery team. This will ensure that by sitting on the change control board, one can evaluate the suggested improvements for compliance with DR/BCP policies. A change could be any service improvement in organization, such as a new server, updated version of software, driver update, an increase in disk space, or updating the operating system. A typical RFC answers the following questions:
- Who requests the change and the department manager?
- What is the description of the change?
- What is a benefit of the change?
- When is the change requested and the duration of time?
- Is there a production impact?
- What is the change classification?
- Which hardware is involved?
In addition to all of the above, an RFC should also include the release instructions for the engineer to perform, as well as the roll-back instructions and a test plan to verify that the change was tested in development environment.
One of the real-life examples that I always faced with was that requestor of the change does not always ensure that the change gets implemented on the disaster recovery footprint as well. The DR infrastructure could get out of sync with production, which is one of those issues that companies will most likely try to avoid. A typical change goes through approval process of all IT teams and those lines of business that are impacted by it. One will be in command of what could be approved and what needs to change in order to satisfy audit requirements. Being involved with the complete DR build-out for one of our major financial clients, I have seen many cultural changes that went on in the organization. One of the fall-outs of this critical change management process improvement was that SMEs became more and more aware of the new DR footprint that was made available to them. It’s possible to have been part of organizations that did not care much for their recovery solution while sometimes not even being aware that they had technology available to them in the first place. This is the individual’s time to ensure that SMEs start to get more and more aware of their own respective disaster recovery footprint in order to maintain and properly update it.
Another critical step that should not be avoided is the inclusion of the disaster recovery servers along with the production servers in the requests for a change. This will serve as a starting point to build awareness of the program as well as to ensure that the release engineers will not overlook them. After all, it is the disaster recovery manager’s responsibility to audit the changes for compliance to the corporate policies.
The third most important aspect of a successful disaster recovery program is to audit any new proposed changes and enforce SMEs to update their plans accordingly. One will be quite amazed by the amount of changes that are not reflected in the recovery plans. Many organizations transfer plan updating responsibilities to their dedicated staff. However, others are letting their SME staff to update their own plans. There is nothing like “one size fits all.” At the same time, it is essential to not only track the changes and ensure that they are reflected on the disaster recovery footprint, but to also enforce plan builders to go back and update their documentation. IT is an interesting world where nobody wants to do the paperwork. This is where a job of disaster recovery manager comes into place. Typically enforcing plan updates could be done via a direct email to the SME and requesting to ensure the proposed change is also reflected in the recovery documentation.
Going back to my real life examples, enforcing the plan updates could be yet another difficult task to accomplish. However, with a few proper steps, you can be assured that it is not being overlooked. One of the critical steps is to always keep the senior management aware of any status updates when it comes to change management and plan approval process. If an SME is non-compliant, one can raise the issue to his or her immediate supervisor and ask to escalate it to SME’s manager. This will ensure that the audit is performing and escalating appropriately. Furthermore, in order to run a successful disaster recovery program and polish the change management process, it is essential to be proactive when it comes to plan updates. Don’t shy away from holding release approval until one sees the changes being reflected via plan updates. It is one of the most common practices that are currently being used by many disaster recovery teams. Is the DR hardware missing from the RFC? Are the locations not clearly defined? Has the recovery plan been updated with the new hardware or still contains what is now decommissioned? This is the time to control and ensure that all of the work that has been done to build the DR footprint is not left in smoke.