Data Center is our focus

We help to build, access and manage your datacenter and server rooms

Structure Cabling

We help structure your cabling, Fiber Optic, UTP, STP and Electrical.

Get ready to the #Cloud

Start your Hyper Converged Infrastructure.

Monitor your infrastructures

Monitor your hardware, software, network (ITOM), maintain your ITSM service .

Our Great People

Great team to support happy customers.

Sunday, October 19, 2014

Mengubah Business Continuity menjadi Nilai Tambah

Turning Business Continuity Into A Competitive Advantage

Written by  DENNIS WENKJanuary 13, 2014
The pace of technological change and its complexity is challenging traditional business continuity paradigms. What was once considered a best practice in business continuity (BC) no longer serves the new digital-world, and organizations can’t rely on these outdated processes to reach their future objectives. These best practices, and the standards/guidelines which they are based, are unsuitable for the modern technologically-dependent organization because they were intended to serve a different purpose within a vastly different business environment. Some might question or be puzzled by the notion that long-standing and widely-accepted best-practices could be unreliable, however, that really shouldn’t be so disturbing. After all, blood-letting was once a medical best practice.
It is time to modernize business continuity and align to the genuine needs of today’s technologically dependent organization. Today almost everything an organization does relies in some form or other on information technology. Businesses use IT to link to customers, suppliers and partners; to increase their operating efficiencies, connect global supply chains and more. With advancements in IT, we now do more transactions online, of greater value, and faster than ever before. It could be said that the modern organization is entirely dependent on IT. In a world filled with thousands of servers that are executing petabytes of data and covering hundreds of miles of networks in less than a Nano-second, unforeseen one-in-a-million glitch events can happen in the blink of an eye. The complexity in today’s IT is extremely different from the uniform and homogenous IT environment that was in place when many business continuity best practices were designed. To rely on these old best practices for your business continuity strategy creates blind-spots that may lead to significant oversights (that profoundly affect the reliability of the overall strategy).
The prevailing best practices in business continuity has favored a “better safe than sorry” approach to dealing with risk. In ordinary life “better safe than sorry” seems quite sensible. It does seem better to be safe. However, this paradigm does not work when the cost of the safety is greater than the cost of the risk.
Safety is not an all-or-nothing condition. Risk comes in matters of degrees and mitigation actions have a variety of trade-offs. There are times when this perception of safety causes blind-spots that lead us astray and cause us to overspend or waste valuable resources; but feeling safe is not the same as being safe. As Robert Hahn (in his book “Making Sense of Risk”) points out to Congress, “This leads to a paradox that is becoming increasingly recognized. Large amounts of resources are often devoted to slight or speculative dangers while substantial and well-documented dangers remain unaddressed.”
We can’t bet our organization’s valuable, scarce resources based on intuition and rules-of-thumb. Its harm is when resources are disproportionally allocated to efforts based on precautionary heuristics then those resources will not be available for less obvious but potentially more harmful risks.
Managing continuity in today’s complex IT-dependent organization requires replacing the ‘better safe than sorry’ heuristics with optimal risk-reduction actions. Managing risk depends on measuring the size of the investment and the speculative-ness of the harm. The potential negative consequences of catastrophic events such as floods, fires, hurricanes, tornados, earthquakes, terrorism, pandemics, or a meteor strike is quite significant. The question is not whether these events are hazardous or whether they should be of interest to an organization. It is obvious that the loss of life and resources from catastrophic events can cripple a business, and being unprepared for such an event is equally obvious, but capitalism is not about doomsday prepping. Capitalism is about calculated risk-taking: no risk taking, no innovation, no competitive advantage, and no shareholder value. Congressman Michael G. Oxley points out in a House financial report that, “Capitalism is about taking risk, and that is what makes our system so productive.”
The big question, the question that precautionary principal does not and cannot answer for business continuity is “when to stop-spending resources on safety?”
Many business continuity best practices conceal the precautionary-bias by using legitimate sounding terms such as risk appetite, risk tolerance, and risk aversion, but these terms are never developed beyond heuristics and subjective judgment. These terms are just ordinary perceptions about risk and they are neither measurements of risk nor can they be used to calculate risk. They simply tell us how we feel about risk.
Other business continuity best practices mask their subjectivity and bias through the use of elaborate high-medium-low (HML) matrix-models. These tools don’t calculate risk – they merely rank perceptions of risk, providing no credible information or statistical grounding needed to make a rational decision on how to optimally reduce risk. These models describe how we feel about risk, which does not help answer “what to do” or “how much to spend?”
The precautionary-bias is peppered throughout the many business continuity standards, guidelines, best practices, as well as its certifications. Today it is more important than ever for a balanced approach to business continuity and precautionary guidelines that consistently ignore minor cracks in continuity will not serve that purpose. Our organizations would be better served if business continuity first looked for ways to proactively fill those continuity cracks rather than solving for the next apocalypse. All in all “a stich in time saves nine.”
The real problem with traditional approaches is not that they are wrong, but that they offer no guidance to modern organizations on how to optimally reduce risk; how to fill the cracks. The unintended consequence of these outdated business continuity methods has been that the operational aspects of IT have been systematically neglected, and this might be the biggest blunder in business continuity today.
With all these best practices, these HML-matrix-models and this talk of risk aversion, there seems to be a growing and significant disconnect with what is actually happening in our new digital-world. Business continuity routinely dismisses IT-risks in favor of the prevailing “risk-of-the-month” because the best practices have a close affinity to the precautionary-bias. While few would argue that IT is becoming increasing important to every organization, a business continuity certification consultant recently stated at an industry event, “the ultimate goal of BC activities was to get out of the data center.”
That is an antiquated notion that undoubtedly implies the IT-infrastructure is unworthy of serious attention from business-oriented BCM practitioners. Nothing could be further from the truth.
The precautionary-bias coupled with people’s fear will trigger perceptions about worst-case scenarios that make them appear increasingly plausible. In 2008/2009, the United States suffered a major financial meltdown, one with an impact that many economists have estimated at $1.8 trillion.
While we intuitively understand the consequences of a loss at that scale, most of us fail to recognize the extent of a silent IT disaster unfolding under our virtual noses. According to IT complexity expert and ObjectWatch founder Roger Sessions, organizations in the United States lose $1.2 trillion from IT failures every year. Worldwide, the total comes to $6.2 trillion. Although Sessions’ numbers have been challenged by other economists, the calculations remain sobering, concluding that threat worldwide is only $3.8 trillion.
The most notable aspect of Session’s math is this: the overwhelming majority of the annual $1.2 trillion loss is not caused by the low-probability/high-consequence catastrophes that capture attention, but by high-probability/low-consequence events that occur frequently, such as software bugs, hardware failures and security breaches. Worse, as applications become more complex, involving an ever-larger tangle of codes, data nodes, and systems networks, the exposure to these “smaller” events becomes more frequent and their impact more costly.
The sheer size of these losses due to IT-failure should serve as a wake-up call for anyone related to business continuity. How could the very practices that were intended to provide continuity for our organizations allow interruptions that generate losses of this magnitude? Either business continuity’s target or its aim has been considerably off. While business continuity has been waiting and preparing for a catastrophic event, it ignored the real risk to continuity, IT. Business continuity best practices absolutely must start to do things differently. We need to start thinking rationally about where to devote our efforts and where to place our emphasis. Genuine business continuity best practices must make certain that real and serious risks receive the attention that it deserves.
The big question, as we discussed earlier, covers optimization of scarce resources in the present to achieve the greatest benefit for our organization in the future. After all, it is not about turning the lights back on once they fail; continuity is about ensuring the lights never go off in the first place.
For business continuity the big question has two components: (1) which risks are the serious ones and (2) what are the optimal risk-reduction actions? Traditional methods currently used in business continuity offer little advice to answering the big question. In fact, the current set of heuristics can be dysfunctional because it unknowingly distracts resources to slight or speculative dangers.
Many in the business continuity community share a mistaken belief it is impossible to develop credible quantitative risk estimates. That belief is illusory, as real world experience shows there is a wealth of data on which to base quantitative risk estimates with a degree of precision and confidence sufficient to support sound management decisions. We don’t have to be perfect, in fact we can’t be perfect, and perfection is infinitely expensive. We do need to increase the probability of success by reducing our losses. We need to apply the appropriate level of discipline relative to the complexity of the situation; IT is too complex to use heuristics, rules of thumb, and intuitive judgment.
While precise information is extremely valuable, even small amounts of reliable data is infinitely better than relying on subjective value judgments when it comes to making sound decisions about managing IT-infrastructure risk. Risk-related data is increasingly available. There is a surprising amount of information from which to make realistic calculations and estimates about IT infrastructure and operational risks. As Immanuel Kant said, “We have a duty – especially where the stakes are large – to inform ourselves adequately about the facts of the situation.” All in all, it is far better to use empirical data than rely on intuitive, subjective judgments.
Business continuity must make informed estimates about future losses and then take appropriate action based on those estimates. The underlying economic models must be constructed to accurately portray all of the pertinent risk parameters, as opposed to measuring risk-perceptions. Cost-benefit balancing can be applied to ensure a proper proportional-response. To keep the odds in our favor we must economically quantify the operational risks of the IT-infrastructure so we can properly evaluate the many tradeoffs and reach the optimal risk-reduction solution for our organizations.
With $3 to $6 trillion a year at stake, understanding how to prevent the continuing spiral of IT failures will have substantial benefits. In these difficult economic times, there is a tremendous amount of goodness that $3 to $6 trillion could add to our global economy. Making rational decisions about calculated risks which reduce the economic impact of IT failures will be key to achieving a competitive advantage.
Wenk DennisDennis Wenk is a senior manager in competitive strategy and market insights for Symantec covering cloud/virtualization/big data. He has consulted with large Fortune 500 companies in more than 20 countries.

Log Management berguna saat terjadi Disaster

Log Management: When Disaster Strikes You’ll Be Glad You Did

Written by  MISHA GOVSHTEYNJanuary 13, 2014
How Log Management Can Enhance Infrastructure Security in Disaster Recovery Situations
When an enterprise network goes down – because of natural disaster, accidental system failure, or security breach – the first priority is identifying the impact and restoring the infrastructure to its pre-disaster state. However, how “clean” is the pre-disaster infrastructure? Have servers been tainted with malware that can replicate to multiple data systems on the disaster recovery site and allow security breaches? Were data confidentiality, integrity, and availability compromised in any way, either before or during the incident? And would you have the data required to troubleshoot these issues during an outage?
The bad news is that such security breaches are often not found for weeks or months after they occur. However, log management best practices can provide those answers and help you determine if you’re working with a compromised infrastructure following disaster recovery (DR). Moreover, compliance with PCI, Sarbanes-Oxley, HIPAA, and GLBA regulations demand that log data be collected, regularly reviewed, and securely archived. The need for effective log management may seem obvious, but surprisingly the value of log forensics for supporting infrastructure security is often overlooked. This article will take a look at best practices using log data analysis to enhance the overall security and availability of data after disaster recovery, and the advantages of automated log management and analysis delivered as a service.
The Value of Log Management
Log management is an infrastructure management best practice that supports not only performance management but also security incident response and compliance requirements. It is a complex process of collecting all sources of data in an enterprise environment and making it usable in a normalized, searchable format. Reviewing and analyzing log data regularly is a best practice for meeting compliance regulations, identifying suspicious activity, and generating forensic data for internal investigations. Properly collected, stored, and analyzed log data can provide a holistic view of your data flows, and most importantly, alert you to anomalies that could indicate security breaches. Log analysis can reveal unauthorized system access, failed logins, file access records, malware activity, botnet activity, and other either failed or successful attempts to hack or steal data.
However, effective log management remains a challenge for many companies as the size and types of data sources continue to multiply exponentially across the enterprise. When performed manually, log management requires investments in additional IT staff and product acquisitions, both of which are cost prohibitive to many organizations and ultimately often fail to deliver the consistent, in-depth analysis required for DR infrastructure security.
Automated log management in the cloud as a SaaS offering is overcoming these challenges by simplifying the implementation, management, and analysis of log data, especially in the critical areas of analysis and reporting. This approach provides the consistent, automated collection, normalization, and storage that is critical to the confidentiality, integrity, and availability of enterprise data. Moreover, automated log management gives companies the agility and flexibility to collect and manage data from today’s virtual servers, elastic cloud environments, and hybrid environments, and integrate this data with traditional on-premise sources. The most effective SaaS solutions will also enhance automated functionality with the services of live security experts who can translate complex log data into actionable insight for protecting the infrastructure against security threats.
Recommended log management best practices:
  • Include log management in the incident response plan – Log management is most effective as an infrastructure security measure if it is included as a component of the incident response plan and not a second-thought measure during the chaotic hours after the incident. Specifically, consistent collection and analysis of multiple sources of log information from all data sources is the core process.
  • Store log data securely off-site to ensure availability – Log information can be an attractive target for malicious hackers. Maintain log data securely offsite just as you would your core data to ensure its availability and integrity during a disaster incident.
  • Alert on key activities to get warnings of unusual activity – Beyond its use for after-the-fact forensics, log management can also be a key “early warning system” against possible breaches in progress that could replicate onto a DR infrastructure. In addition to typical log types covering logins and administrator actions, an automated log management system can support infrastructure security by including the following log collections and alerts:
    • Anti-malware software – These logs can indicate that malware was detected, disinfection attempt results, file quarantines, when file-system scans were last performed, when anti-virus signature files were last updated, and when software upgrades have taken place.
    • Applications – Logs can include account changes, user authentication attempts, client and server activity, and configuration changes.
    • Authentication servers – These typically log each and every authentication attempt showing the originating user ID, destination system or application, date and time, and success/failure details.
    • Firewalls – These very detailed and informative logs can show what activity was blocked according to security policies.
    • Intrusion detection and protection systems – These systems record detailed information about suspicious behavior and detected attacks as well as actions taken to halt malicious activity in progress.
    • Network access control servers – These logs can provide a great deal of useful information about both successful/permitted and unsuccessful quarantined network connections.
    • Network devices (routers, switches) – Network device logs can provide information on network communication activity and what types of traffic were blocked.
    • Operating systems – Beyond typical log entries, operating system logs can contain information from security software and system applications that can help identify suspicious activity involving a particular host.
    • Virtual private networks (VPNs) – VPN logs record both successful and failed connection attempts, date and time of connects and disconnects, and the types and amount of data sent and received during a session.
    • Vulnerability management software – Scanning and patch management software log entries such as configuration, missing software updates, identified vulnerabilities, and patch/scan currency downloads.
    • Web application firewalls – WAFs generate “deny logs” which identify blocked application requests, useful in identifying attempted attacks that included applications as a possible entry into systems.
    • Web proxies – Web proxy logs record user activity and URLs accessed by specified users.
  • Have experienced analysts regularly review log data – Warnings of possible threats to the infrastructure are embedded in all of the log data flowing through the above systems. Regular log analyses can reveal them and trigger preventive action. Few companies can afford the time and cost to have in-house IT staff with the expertise to sift through thousands of log entries per day and detect anomalies. But the powerful analytic engines in today’s automated log management systems, combined with the expertise of live security analysts in a Security-as-a-Service environment, can quickly collect and analyze log data to deliver actionable results.
Finding the Needle in the Haystack
Outside of an emergency incident, automated log management systems also support the availability of enterprise data by parsing and normalizing all of the large, multiple flows of diverse log data to make it easily searchable. As such, it is possible to find in seconds that series of failed login requests or privilege escalation that led to a problem later. However, log management systems really pay off when trying to find the “needle in the haystack” during suspicious network incidents and dealing with compliance issues. The following use cases show how log management best practices can enhance infrastructure security and daily compliance tasks alike:
Use case #1: Detection of unauthorized changes to domain policies
A review of log data tracked changes to an administrator account, but the administrator credibly claimed not to have made the changes. Further analysis traced the login to a known attacker who used stolen credentials. Local logs had been deleted, but the customer had secure offsite data available via a log management solution. To prevent further breaches, the customer set an automated alert for admin-level changes. In addition, the daily log review analysis function in the solution “watches” for other suspicious activity.
Use case #2: Finding audit information quickly
Compliance regulations required that a business identify failed login attempts on admin accounts and demonstrate to an auditor that these attempts were identified. Finding the failed attempts was difficult and time consuming. Using a log review function, a daily analyst report on failed admin login attempts could be generated. These reports are stored for a year and are easily available in seconds to show compliance with a daily log review mandate.
Be Proactive with Log Management
What you don’t see can hurt you. In the end, including log management in your incident response plan is a proactive way to gain deeper visibility into your infrastructure and protect against the replication of security breaches during disaster recovery. Make a commitment to consistent collection of all log data from all disparate sources – on-premise, virtual, and cloud – and to consistent analysis that will give you actionable insight into keeping your data always secure and available.
Govshteyn MishaMisha Govshteyn is chief strategy officer and co-founder of Alert Logic, a leading provider of security-as-a-service solutions for the cloud. Govshteyn co-founded Alert Logic in 2002, and is responsible for security strategy, security research and software development at Alert Logic.

The State of IT Resiliency and Preparedness

The State of IT Resiliency and Preparedness

Written by  Rachel DinesJanuary 13, 2014
Click the thumbnails to view the accompanying figures.
Forrester Research and the Disaster Recovery Journal have partnered to field a number of market studies in business continuity (BC) and disaster recovery (DR) in order to gather data for company comparison and benchmarking and to guide research and publication of best practices and recommendations for the industry. This is the seventh annual joint survey study, and it’s focused on gathering a baseline for company DR preparedness. This study repeated many of the questions that we asked in 2007 and 2010, to determine what has changed in DR during the past several years. Specifically, this study was designed to determine:
  • Company practices regarding DR planning, DR plan maintenance, and DR testing.
  • The percentages of companies that have alternate recovery sites, the number of sites, and the distance between sites.
  • Current recovery objectives and technology selection.
  • Company confidence in DR preparations and preparedness.
  • The most common causes of disaster declarations and downtime and the cost of downtime.
  • Market drivers fueling continued improvement in DR preparedness.
Study Methodology
In the fall of 2013, Forrester Research and the Disaster Recovery Journal (DRJ) conducted on online survey of 96 DRJ members. In this survey:
  • Thirty-seven percent of respondents were from companies that had 0 to 999 employees; 26 percent had 1,000 to 4,990 employees; 17 percent had 5,000 to 19,999 employees; and 20 percent had 20,000 or more employees.
  • All respondents were decision-makers or influencers in regard to planning and purchasing technology and services related to disaster recovery.
  • Respondents were from a variety of industries.
This survey used a self-selected group of respondents (DRJ members) and is therefore not random. These respondents are more sophisticated than the average. They read and participate in business continuity (BC) and disaster recovery publications, online discussions, etc. They have above-average knowledge of best practices and technology in BC/DR. While nonrandom, the survey is still a valuable tool in understanding where advanced users are today and where the industry is headed.
Executive Summary
This year’s survey reveals a mixed bag of DR preparedness. On the one hand, testing and plan maintenance is improving, and more advanced technologies are being used for protecting mission critical applications. However, we continue to struggle with long recovery time and recovery points, and confidence has fallen off slightly. Barriers and roadblocks include IT and business miscommunication and mismatched expectations of capabilities, while increased risk profiles and costs of downtime continue to fuel our need to drive DR programs forward.
Disaster Recovery Planning, Maintenance, And Testing Makes Incremental Improvements
Most experts will agree that running tests and exercises are the best way to ensure preparedness. In the past, survey results have returned disappointing results around organizations’ testing regimens. However, this iteration reveals good news: 39 percent of companies are now running a full test once per year with another 31 percent running a full test twice per year or more frequently (see Figure 1). Additionally, one of the areas that plans will often fail is when they are not up to date. With the rapid rate of business and IT change today, it’s critical that companies update their plans continuously, something that 35 percent of respondents now do.
Companies Look To Cloud And Colocation For DR Sites, Separation Is Moderate
DR in the cloud has been a hot topic that has garnered a significant amount of attention during the past few years. However, to date, adoption has been low, less than 10 percent. However, according to the latest survey, 15 percent of companies are now using the cloud as a recovery site. Use of colocation for recovery sites is significantly higher as well, at 38 percent. However, the most common method of sourcing recovery sites is still in-house, although 20 percent responded that they use an equal mix of in-house and outsourced models (see Figure 2).
Site separation has also been a topic of hot discussion, especially after Hurricane Sandy proved to many organizations that the separation between their sites was not sufficient. According to the latest survey, average distance between sites is approximately 600 miles. While there is no absolute right answer for how far apart recovery sites should be, the rule of thumb is that they should not be subject to the majority of the same risks.
Firms Turn To Advanced Technologies To Protect Growing Critical Systems
According to the 2010 Forrester/DRJ survey, the top risk that BC/DR managers face today is the increased reliance on technology. This fact is further illustrated by the fact that more than one third of our systems are now considered mission critical. Compared to past survey results, the number of non-critical systems continue to shrink.
To address increasing business expectations and shrinking RTOs and RPOs, more firms are turning to technologies such as replication for mission critical systems, which more than half of companies now use, compared to 35 percent in 2010. Legacy technologies like tape still play an important role in continuity plans and remain the most popular method for protecting non-critical systems.
Recovery Time Actuals Lengthen, And Recovery Point Actuals Stay Flat
One of the biggest challenges in DR today is the pressure between business expectations for recovery objectives and IT’s ability to deliver on them. In fact, 35 percent of companies responded that mismatched business expectations with IT capabilities was one of the biggest challenge they faced when recovering from their most recent disaster or major business disruption. While the business is demanding shorter and shorter recovery times and points, actual recovery times are actually lengthening — in 2013 median actual recovery times were eight hours, up from three hours in 2010. This trend of increasing actual recovery times is not new, it can be traced back even to the 2007 survey. For example, in 2007, a notable 30 percent of companies reported they were able to recover from their most recent disruption in under an hour. In 2010, which shrunk to 13 percent, and today only 2 percent of companies said they were able to recover in less than one hour from their most recent disruption (see Figure 4).
Recovery points, on the other hand, stayed flat to slightly down, most likely to increased usage of replication, both synchronous and asynchronous. In 2013, the median recovery point actual was 0, the same as it was in 2010. Like the trend of increasing recovery time actuals, the trend in decreasing data loss can be traced through from 2007. For example, in 2007, 19 percent of companies said they sustained between 1-5 hours of data loss. This grew slightly in 2010 to 21 percent and then to 27 percent in 2013, while those who could recover with less than one hour of data loss remained effectively flat.
More Disasters Are Declared, Power Failures Remain The Top Culprit
For those who think “it won’t happen to me,” think again. According to the latest survey, one in three companies have declared a disaster in the past five years. In 2010, the statistic was one in five. And overall, only 31 percent of firms say they have never declared a disaster, down from 36 percent in 2010.
While it may be tempting to blame some of the large scale disasters that have occurred in the past few years on natural disasters such as Hurricane Sandy or the Japanese Tsunami, it’s still the mundane events such as power failures, IT failures, and human error, that top the list of causes (see Figure 5).
Most organizations still struggle to understand their cost of downtime—57 percent said their organizations had not calculated this, and another 30 percent said it had been calculated, but they didn’t know what it was. Those who did know their hourly cost of downtime gave answers in the range of $10,000 to $3.5 million. One area organizations did understand, however, was the impact of downtime on their organization. The biggest impact to organizations was loss of productivity, followed by lost business opportunities and drop in employee morale.
Confidence In Capabilities Erodes, Regulatory Compliance Drives Future Improvements
Given the longer recovery times, more critical systems, and increased complexity, it’s no surprise that confidence in our DR preparedness has fallen during the past few years. Today, our confidence in our ability to meet recovery objectives is significantly lower than it was in 2010, with 16 percent of respondents in 2013 saying they felt very prepared down from 23 percent in 2010, but this most likely represents a more balanced and realistic viewpoint (see Figure 6).
Nevertheless, firms overall agree that there is room for improvement, and 40 percent say improving DR capabilities is a critical priority. The drivers behind this motivation, however, have changed significantly since 2010. Regulatory and legal drivers, which ranked fifth overall in drivers in 2010, are now top of the list. Other top drivers include many of the usual suspects, including fiduciary responsibility to stakeholders and increased overall risk.
Dines RachelRachel Dines is a senior analyst with Forrester Research. Dines’ research focus is on IT continuity and DR services and technologies, next-generation high availability and backup, and data center strategies.

Kompleksitas Desain Disaster Recovery untuk BUMN

Complexities of DR Design for the Federal IT Enterprise

Written by  Mano Malayanur & Julia NathanApril 24, 2014
Design of disaster recovery (DR) solutions is sometimes defined as the art of balancing the cost of disruption against the cost of recovery. This simple and elegant definition often masks the true complexity of DR. Cost of disruption is rarely measured solely in terms of revenue loss and often includes the damage to intangibles such as reputation. Likewise, the true cost of recovery is hard to quantify, and it is spread across the choices made in information technology (IT) resources, processes, and technologies deployed.
These complexities are magnified in case of the federal IT enterprise. Lack of a profit motive for federal agencies can make it impossible to make a monetary case for DR. In private companies, cost of a disaster can often be measured in terms of loss of revenue and profit. However, the purpose of a federal IT system is typically not to make a profit or even raise revenue, but to provide a service or support a mission. IT in federal agencies is often very complex, with state-of-the-art technology coexisting and dependent on legacy systems decades old. Since any company complying with the federal acquisition regulation can compete to offer services, federal IT contracts are often fragmented: people who designed the system may be different from those supporting it and may operate independently from those supporting the dependent systems and the underlying infrastructure. Agencies are subject to many federal mandates such as the Federal Data Center Consolidation Initiative, the Federal Cloud Computing Strategy, guidelines from the National Institute of Standards and Technology (NIST), and directives from Office of Management and Budget (OMB), which affect IT investment strategy. Effective DR under these conditions requires an enterprise-wide approach so that expectations are clearly set for all involved.
NIST logo
BIAs and the Headline Risk
No agency leader relishes the prospect of making newspaper headlines as a result of a failed IT system. Failure of a key federal IT system carries with it the risk of administration and Congressional inquiries and exposure to the public, with bad consequences to follow. This consideration is often an excellent motivation for implementing an effective DR solution. DR solution design starts with a business impact analysis (BIA) to identify critical systems, gain a realistic understanding of the end user workflow, determine business requirements, and establish recovery needs. In complex federal IT environments, it is crucial to understand the dependencies between IT systems at this stage. This task can be especially challenging given the fragmented nature of federal IT support. It does no good for a critical system to be able to recover in minutes if a system it depends on takes hours to recover.
Federal Standards and GuidelinesUS-OfficeManagement-Budget
Several Federal Information Processing Standards (FIPS) and NIST guidelines need to be taken into account when designing a federal DR solution so that they will be supported by the end product. The following is a sample:
  • FIPS 199 “Standards for Security Categorization of Federal Information and Information Systems”
  • FIPS 200 “Minimum Security Requirements for Federal Information and Information Systems”
  • NIST SP 800-34 “Contingency Planning Guide for Federal Information Systems”
  • NIST SP 800-53 “Recommended Security Controls for Federal Information Systems and Organizations”
  • In addition, each agency typically has its own policies and guidelines that must also be addressed.
The Right Number of Tiers for DR
Federal enterprises tend to own systems that support varied capabilities, which translate to broad ranges of recovery point objective (RPO) and recovery time objective (RTO). DR tiers need to support the full range as informed by the BIA. Also, tiers need to be clearly distinct from each other in terms of cost and technology solution so that they can be explained to non-IT customers easily. This limits the number of tiers to no more than a handful. A notional set of DR tiers may look as seen in Table 1.
Tier 1 supports the most aggressive recovery requirements. Solutions tend to be complex, expensive, and are usually not required except in extreme use cases. Agencies may find they have no use for this tier at all. Tier 2 supports slightly less aggressive recovery objectives and should be reserved for the most critical agency systems. Tiers 3 through 5 may be separated by the readiness of the DR infrastructure and the technology used for data recovery. Given the complexity of the federal acquisitions process, pre-existing agreements with infrastructure vendors may be required.
Technology Solution Considerations
Seconds or less
Seconds or less
Active-active solution across all layers of IT (presentation, application, data, and common infrastructure); usually designed into the application itself
Active-active across presentation and application layers; automated fail-over for the data layer; data replication performed by the application (rather than storage)
Fail-over can be manual; hardware and common infrastructure ready to be fired up at the DR site; SAN level or database level data replication would apply
1 Day
Fail-over using backups; hardware and common infrastructure may not be ready at the DR site
Fail-over using off-site backups; hardware can possibly be acquired after disaster
Table 1 Notional DR Tiers
It’s all About the Data
Federal IT systems often manage critical and sensitive data of national importance. Safekeeping of that data is crucial. That consideration affects the DR technical solutions in two significant ways.
Data loss: Federal data centers tend to be sprawled across the country, often for various reasons, with the DR site hundreds of miles away from the primary site. While that distance and the networking delays that result from it may not cause performance issues for the users of IT systems, it affects DR design significantly where milliseconds do matter. Data replication tends to be extremely sensitive to physical distance, which helps rule out “zero data loss” replication, which in turn implies that replication often lags the primary site, and some data loss should be expected in case of a disaster. DR solutions must account for this.
Data inconsistencies Because of their complexity, federal systems tend to be highly interdependent, with relevant data spread across multiple systems. After a disaster, it is quite possible for the states of the systems to be inconsistent with one another. Replication of one system may lag another significantly, resulting in differing amounts of data loss. The discrepancy between the systems can cause major issues on fail-over. DR design needs to address such cases by ensuring that the appropriate processes and tools are in place.
Cloud-Based DR?
At first blush, cloud-based DR looks appealing: instead of having the IT infrastructure sit idle waiting for a disaster to occur, one could leverage the rapid elasticity characteristic of cloud computing and access a pre-developed cloud-based infrastructure after the disaster. This could result in cost savings by avoiding capital expenditure.
On closer scrutiny, this appeal begins to wane. If the cloud is private, the agency still needs to build and maintain the cloud infrastructure for DR. While the resources can be used for other purposes during normal operations, the savings would not be as high as anticipated. If the cloud is public or community, other problems arise: is the agency ready to deploy applications to a public cloud? Cloud deployments in general require many preparatory steps, including pre-existing contracts, trusted connectivity to the cloud provider site, operations and maintenance (O&M) process integration, and the security certification per agency policy, including a FedRAMP certification, which is required for all federal agency public cloud deployments. [The Federal Risk and Authorization Management Program (FedRAMP) is a unified, government-wide risk management program focused on large outsourced and multi-agency systems. FedRAMP has been established t provide a standard approach to assessing and authorizing cloud computing services and products.]
The questions only compound at the individual system level. What are the system security requirements? As of now, FedRAMP covers only Federal Information Security Management Act (FISMA) low- and moderate-rated environments. What is the system’s infrastructure platform? Public and community cloud providers often restrict the choices of technology; for instance, typically the operating systems are limited to Linux and Windows. Are the storage systems compatible between the agency data center and the cloud provider? What are the system RPO and RTO requirements? The more aggressive the recovery objective, the more resources need to be pre-deployed in the cloud. Replicated data requires that storage be pre-allocated in the cloud. Automated fail-over requires that servers are pre-configured in the cloud. A high degree of process coordination is required to meet aggressive RTO and RPO requirements. Network connection modifications typically require large lead times, so they need to be in place in advance. Costs of these items whittle away at the basic appeal of cloud-based DR. In general, cloud-based DR may have a better economic appeal for systems that have less aggressive recovery objectives.
DR Tiers:
System Class X

Active/Active Web Servers
Passive Web Servers
Must be purchased
Application Clusters
VM replication
Must be purchased
Database/SAN Replication
SAN Replication
Tape Backup
Common Inf.
Live DR site
Live DR site
Must be purchased

System Class Y

Passive Web Servers
Must be purchased
Passive App Servers
Must be purchased
SAN Replication
Tape Backup
Common Inf.
Live DR site
Must be purchased
Table 2. DR Solution Example
One-Size DR Technology Doesn’t Fit All
Given the wide range of available DR technologies and the diverse requirements of a federal IT infrastructure, a holistic enterprise-wide approach to DR technologies is warranted.
The first step is to classify systems so that a standard DR solution set can be proposed for each class and recovery tier. A system class can be defined as a feature or features of the system that are significant in a DR context, such as hardware platform, or even support organization. For example, the classes could be “Class 1: SPARC-based” or “Class 2: Mainframe.” Such classification system can also be used to reinforce agency policies on platform choices for applications. A small number of classes are recommended to keep the solution set manageable.
The next step is to constrain the DR technology space in terms of the defined system classes. A complete DR solution consists of a layered selection of technologies that collectively meet the DR requirements of the class. Technologies can be categorized using the following four-layer model:
  • Presentation – Technologies that allow users to access applications. These focus on security, Web access, or content delivery.
  • Application– Technologies that implement the IT system business rules and processes.
  • Data – Technologies that focus on data recovery and replication: database or application-based, host-based, storage-based.
  • Common Enterprise Infrastructure – Technologies or DR considerations related to common services that support multiple applications, such as network services, firewalls, load balancers, Domain Name Service (DNS), and basic data center facilities.
Candidate technologies at each layer can then be organized in terms of the system class that they support. As the technologies are organized, a complete DR solution develops that meets the requirements for each system class. A conceptual example, which is not intended to be a technology recommendation, is shown as Table 2.
The result is a DR solution matrix that provides a standard set of enterprise-level solutions while still meeting the needs of diverse enterprise architecture.
Making the Solutions Stick
Given the sprawling nature of the Federal IT enterprise, every opportunity must be taken to ensure that the DR solutions are adopted by all. Once DR tiers are developed for the enterprise and the DR technology solutions determined, they need to be communicated as widely as possible and integrated tightly into the IT processes.
DR tiers and solutions should be adopted as an agency standard in the agency technology reference guide, if one exists. Compliance with such guides is often required of contractors.
DR solutions should be embedded in the agency Software and System Design Life-Cycles (SDLC). BIAs should be part of the business requirements development, and a standard set of DR requirements should be established for use by the business. DR tier determination should be part of the system design. If the application design needs to play a significant role in the DR solution, software development should include DR features, including approaches to reconcile lost data with other systems. Testing should include making sure that the RPO and RTO requirements of the application can be met.
DR solutions should also be embedded in the agency O&M processes. Changes made to the production configuration should result in corresponding ones in the DR environment, consistent with the DR solutions. Incident and problem management take on a new significance with DR; there needs to be detailed processes to fail systems over to the DR site including data reconciliation with other systems; to operate and maintain the systems in the DR state; and to re-constitute the primary site after the disaster and fail systems safely back. Part of the O&M plan also needs to include regular, rigorous DR testing with realistic situations.
Bringing it All Together
Successful DR requires that every link in the chain work effectively, starting with the BIA, DR solution design, implementation of the DR solution through standards and the SDLC, and embedding of DR effectively in the O&M processes. Failure of any one piece is sufficient to cause major issues. These steps become ever more important in the federal IT enterprise, given the complexities of federal IT.

Mano Malayanuris a principal systems engineer at the MITRE Corporation, a not-for-profit company that operates multiple federally funded research and development centers. His areas of focus are infrastructure engineering, operations, and management.
Julia Nathan is a lead systems engineer at the MITRE Corporation. Her areas of focus are infrastructure engineering and human factors.
The authors’ affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the authors.

Bersiap untuk bahaya banjir

'To Be or Not To Be' - Prepared for Floods

Written by  Jeremy Duensing & Jeff JohnsonApril 25, 2014
It’s safe to say that most of the country has experienced extremely volatile weather within the last few years. Research shows that weather patterns are, and will continue to be, increasingly volatile due to changing climate trends. Whether it’s dauntingly cold temperatures or unexpected heavy snowfall, it’s no longer a question of if severe weather will occur, but rather when, and when it will affect your assets.
The risk of floods are increasing as the climate becomes more volatile, and now more than ever, it is important to understand what led us to this point and how to stay ahead of flooding waters. Information is power, and preparation is key to protecting assets in an emergency situation.
Climate Change – Human vs. Natural Factors
The earth’s climate is influenced both by natural and human factors, with the impact of each still a hotly debated topic. Natural factors that cause climate variability influence the atmospheric circulation patterns. Seasonal ocean cycles like El Niño and La Niña create wetter or drier and colder or warmer patterns when the prevailing jet stream wind shifts locations.
The North Atlantic Oscillation is another naturally-occurring seasonal influence. The upper-level circulation pattern in the polar latitudes can produce abnormally warm and cold periods throughout North America depending on its location. Ocean temperature patterns, volcanoes, and the sun’s cycles can also influence the climate.Deunsing-Johnson-Hourly-Forecast
Deforestation, agriculture, and urbanization are several human factors that can influence climate change. When a city replaces a forest, the land’s temperature is raised due to increased heat absorption and retention while agriculture provides the opposite effect of cooling an area, especially when irrigated. Increased run-off from land-use changes more quickly fills streams and rivers.
Impacts of Climate Change
Communities, infrastructure, and governments are all at risk of negative impacts due to a changing climate. One of the most devastating results is the increase in the frequency of volatile weather events such as flooding. Scientists have taken note of a greater number of the more extreme weather occurrences in recent years. New research suggests that warming in the arctic region is resulting in a weaker temperature contrast that is slowing down the jet stream. This has led to slower moving weather systems or ones that have been persisting for longer periods of time over the same areas. Extreme weather tends to occur more often with these stalled weather systems.
In 2007, the Intergovernmental Panel on Climate Change (IPCC) concluded that intense rain events have become more frequent in the last 50 years. This, combined with increased run-off from land use changes, means more significant flash flood events can and will occur. In addition, as the oceans warm, scientists predict that the number of hurricanes, and potentially their intensity, could increase. Higher sea levels from warmer waters also can lead to greater storm surges.
According to the U.S. Geological Survey, flooding already costs the U.S. more than $7 billion in damages, killing more than 90 people annually. Increased flooding has potential to wreak even more havoc on dense populations, with 3,800 towns and cities in the flood plain.
From that, it’s not surprising that floods claimed more in terms of number of lives and property damage than any other type of natural disaster during the last century. By the end of the century, assuming no change in built infrastructure or values, a 2013 article in the Journal of Flood Risk Management projected an increase of approximately 30 percent in damages from flooding alone.
These trends in flooding frequency and severity are likely to continue, meaning federal state and local governing bodies and agencies must be prepared. An article in Nature found the recent emergence of a statistically significant positive trend in the risk of great floods is consistent with results from a climate model, and that model suggests that the trend will persist.
Public safety officials must arm themselves with accurate, real-time data that gives a complete picture of conditions they face in order to properly prepare for an increase in frequency and severity of floods. Improved visibility allows officials to make faster, more informed decisions based on aggregated data rather than estimating when a disaster will affect certain assets. Traditionally, several disparate sources were needed to gather information on the two key factors that determined flood conditions: flooding parameters and weather conditions. Today, software can integrate all dire information into a single source that provides a clear, real-time snapshot of the environment.
Flooding Parameters: What to Watch for
A clear understanding of the most critical parameters to monitor for a potential flood is essential: water level, tide prediction, and wave height. New technology gives asset managers the ability to set alert parameters and to be notified whenever one of these factors crosses a threshold that triggers a response.
Tide prediction: Tide predictions are based on the gravitational force of the moon and sun acting on large bodies of water at a given time. These are especially helpful when severe weather and rainfall are approaching, as the tide can significantly impact the severity of conditions. For example, a storm that hits at low tide may not require the same level of preparation and resources as a storm that arrives at high tide.
Water level: The measurement of a body of water’s level is an obvious factor when determining the likelihood of a flood, but quick access to accurate readings may be difficult. Plotting water level observations from thousands of ocean buoys and river gauges alongside other high-impact weather information such as radar and local storm reports, a public safety manager increases his or her situational awareness with a real-time, geographic representation of high-impact ocean tides and river depths.
In addition, hourly forecasts of ocean water levels give critical guidance on where large-scale weather patterns will have an impact on tide levels in the future. This pinpoints specific times of day where a normal tide may be higher than expected due to strong storms and winds.Deunsing-Johnson-observations
Wave height: Not only should an official have the technology that indicates exactly when high tide will occur and its level, but he or she should also have an understanding of wave conditions during that time to better predict how communities and infrastructure may be impacted. High waves may have little impact if they are predicted to occur during low tide. These same high waves occurring during a higher than normal tide may have a much higher impact and will direct the size and type of response necessary.
Weather Forecasting is Essential
Current technologies can combine real-time data with accurate weather forecasting to give public safety officials an even more powerful tool for storm preparedness. Operators now have the ability to use a real-time map of water conditions that can also display hourly forecasts to achieve optimal awareness of when water levels, tides, and wave heights will coincide with high-impact weather.
The climate is changing, and with it the frequency and increasing severity of extreme weather events, including floods. Emergency managers must have a clear view of current circumstances in their territory, as well as an accurate understanding of how forecasted weather events will strengthen or mitigate the impact of severe weather. With an aggregated view of flooding parameters and integrated weather forecasts, resources can be mobilized more quickly, communities can receive more advanced warning, and managers can be better informed to protect people and infrastructure.
Jeremy Duensing is the transportation product manager at Schneider Electric based in Minneapolis, Minn. Duensing joined Schneider Electric in September 1998 and has worked in the weather department since he graduated from the University of Nebraska in Lincoln with a bachelor’s degree in meteorology. In addition to several years on the forecast desk, he has also started a forecast verification program and most recently has been involved in expanding weather products and services to the transportation community and is a member of the American Meteorological Society ITS/Surface Transportation Weather Committee.
Jeff Johnson has been with Schneider Electric for more than 30 years and currently serves as the company’s chief science officer. In addition, Johnson is a Certified Consulting Meteorologist, who’s certification illustrates the highest level of professionalism and overall knowledge of the science of meteorology.