The State of IT Resiliency and Preparedness

The State of IT Resiliency and Preparedness

Written by  Rachel DinesJanuary 13, 2014
Click the thumbnails to view the accompanying figures.
     
Forrester Research and the Disaster Recovery Journal have partnered to field a number of market studies in business continuity (BC) and disaster recovery (DR) in order to gather data for company comparison and benchmarking and to guide research and publication of best practices and recommendations for the industry. This is the seventh annual joint survey study, and it’s focused on gathering a baseline for company DR preparedness. This study repeated many of the questions that we asked in 2007 and 2010, to determine what has changed in DR during the past several years. Specifically, this study was designed to determine:
  • Company practices regarding DR planning, DR plan maintenance, and DR testing.
  • The percentages of companies that have alternate recovery sites, the number of sites, and the distance between sites.
  • Current recovery objectives and technology selection.
  • Company confidence in DR preparations and preparedness.
  • The most common causes of disaster declarations and downtime and the cost of downtime.
  • Market drivers fueling continued improvement in DR preparedness.
Study Methodology
In the fall of 2013, Forrester Research and the Disaster Recovery Journal (DRJ) conducted on online survey of 96 DRJ members. In this survey:
  • Thirty-seven percent of respondents were from companies that had 0 to 999 employees; 26 percent had 1,000 to 4,990 employees; 17 percent had 5,000 to 19,999 employees; and 20 percent had 20,000 or more employees.
  • All respondents were decision-makers or influencers in regard to planning and purchasing technology and services related to disaster recovery.
  • Respondents were from a variety of industries.
This survey used a self-selected group of respondents (DRJ members) and is therefore not random. These respondents are more sophisticated than the average. They read and participate in business continuity (BC) and disaster recovery publications, online discussions, etc. They have above-average knowledge of best practices and technology in BC/DR. While nonrandom, the survey is still a valuable tool in understanding where advanced users are today and where the industry is headed.
Executive Summary
This year’s survey reveals a mixed bag of DR preparedness. On the one hand, testing and plan maintenance is improving, and more advanced technologies are being used for protecting mission critical applications. However, we continue to struggle with long recovery time and recovery points, and confidence has fallen off slightly. Barriers and roadblocks include IT and business miscommunication and mismatched expectations of capabilities, while increased risk profiles and costs of downtime continue to fuel our need to drive DR programs forward.
Disaster Recovery Planning, Maintenance, And Testing Makes Incremental Improvements
Most experts will agree that running tests and exercises are the best way to ensure preparedness. In the past, survey results have returned disappointing results around organizations’ testing regimens. However, this iteration reveals good news: 39 percent of companies are now running a full test once per year with another 31 percent running a full test twice per year or more frequently (see Figure 1). Additionally, one of the areas that plans will often fail is when they are not up to date. With the rapid rate of business and IT change today, it’s critical that companies update their plans continuously, something that 35 percent of respondents now do.
Companies Look To Cloud And Colocation For DR Sites, Separation Is Moderate
DR in the cloud has been a hot topic that has garnered a significant amount of attention during the past few years. However, to date, adoption has been low, less than 10 percent. However, according to the latest survey, 15 percent of companies are now using the cloud as a recovery site. Use of colocation for recovery sites is significantly higher as well, at 38 percent. However, the most common method of sourcing recovery sites is still in-house, although 20 percent responded that they use an equal mix of in-house and outsourced models (see Figure 2).
Site separation has also been a topic of hot discussion, especially after Hurricane Sandy proved to many organizations that the separation between their sites was not sufficient. According to the latest survey, average distance between sites is approximately 600 miles. While there is no absolute right answer for how far apart recovery sites should be, the rule of thumb is that they should not be subject to the majority of the same risks.
Firms Turn To Advanced Technologies To Protect Growing Critical Systems
According to the 2010 Forrester/DRJ survey, the top risk that BC/DR managers face today is the increased reliance on technology. This fact is further illustrated by the fact that more than one third of our systems are now considered mission critical. Compared to past survey results, the number of non-critical systems continue to shrink.
To address increasing business expectations and shrinking RTOs and RPOs, more firms are turning to technologies such as replication for mission critical systems, which more than half of companies now use, compared to 35 percent in 2010. Legacy technologies like tape still play an important role in continuity plans and remain the most popular method for protecting non-critical systems.
Recovery Time Actuals Lengthen, And Recovery Point Actuals Stay Flat
One of the biggest challenges in DR today is the pressure between business expectations for recovery objectives and IT’s ability to deliver on them. In fact, 35 percent of companies responded that mismatched business expectations with IT capabilities was one of the biggest challenge they faced when recovering from their most recent disaster or major business disruption. While the business is demanding shorter and shorter recovery times and points, actual recovery times are actually lengthening — in 2013 median actual recovery times were eight hours, up from three hours in 2010. This trend of increasing actual recovery times is not new, it can be traced back even to the 2007 survey. For example, in 2007, a notable 30 percent of companies reported they were able to recover from their most recent disruption in under an hour. In 2010, which shrunk to 13 percent, and today only 2 percent of companies said they were able to recover in less than one hour from their most recent disruption (see Figure 4).
Recovery points, on the other hand, stayed flat to slightly down, most likely to increased usage of replication, both synchronous and asynchronous. In 2013, the median recovery point actual was 0, the same as it was in 2010. Like the trend of increasing recovery time actuals, the trend in decreasing data loss can be traced through from 2007. For example, in 2007, 19 percent of companies said they sustained between 1-5 hours of data loss. This grew slightly in 2010 to 21 percent and then to 27 percent in 2013, while those who could recover with less than one hour of data loss remained effectively flat.
More Disasters Are Declared, Power Failures Remain The Top Culprit
For those who think “it won’t happen to me,” think again. According to the latest survey, one in three companies have declared a disaster in the past five years. In 2010, the statistic was one in five. And overall, only 31 percent of firms say they have never declared a disaster, down from 36 percent in 2010.
While it may be tempting to blame some of the large scale disasters that have occurred in the past few years on natural disasters such as Hurricane Sandy or the Japanese Tsunami, it’s still the mundane events such as power failures, IT failures, and human error, that top the list of causes (see Figure 5).
Most organizations still struggle to understand their cost of downtime—57 percent said their organizations had not calculated this, and another 30 percent said it had been calculated, but they didn’t know what it was. Those who did know their hourly cost of downtime gave answers in the range of $10,000 to $3.5 million. One area organizations did understand, however, was the impact of downtime on their organization. The biggest impact to organizations was loss of productivity, followed by lost business opportunities and drop in employee morale.
Confidence In Capabilities Erodes, Regulatory Compliance Drives Future Improvements
Given the longer recovery times, more critical systems, and increased complexity, it’s no surprise that confidence in our DR preparedness has fallen during the past few years. Today, our confidence in our ability to meet recovery objectives is significantly lower than it was in 2010, with 16 percent of respondents in 2013 saying they felt very prepared down from 23 percent in 2010, but this most likely represents a more balanced and realistic viewpoint (see Figure 6).
Nevertheless, firms overall agree that there is room for improvement, and 40 percent say improving DR capabilities is a critical priority. The drivers behind this motivation, however, have changed significantly since 2010. Regulatory and legal drivers, which ranked fifth overall in drivers in 2010, are now top of the list. Other top drivers include many of the usual suspects, including fiduciary responsibility to stakeholders and increased overall risk.
Dines RachelRachel Dines is a senior analyst with Forrester Research. Dines’ research focus is on IT continuity and DR services and technologies, next-generation high availability and backup, and data center strategies.