You are on page 1of 22

Disaster Recovery Plan

What is Disaster Recovery?


telecommunications services after an event has disrupted those services.  Events (Huge or small) Earthquake Terrorist attacks on the World Trade Center, which killed thousands and affected everything from telephones to the New York Stock Exchange Malfunctioning software caused by a computer virus
 Restoration of computing and

Why DRP?
 A control might fail, or a threat might occur

that management has not considered or that management has decided to accept as an exposure that cannot be covered via cost effective controls.  When disaster strikes, it still must be possible to recover operations and mitigate losses.  Organization are required to have a properly documented disaster recovery plan at least to lessen the effect of such like disaster.

Purpose
 Enable the Information Systems

function to restore operations  Impact might be localized for example, the PC user might accidentally delete critical data stored on a hard disk. The impact, however, might be wide spread; for example, an organizations main frame computer installation might be destroyed by fire.

Components of DRP
Emergency Plan

Test Plan

Disaster Recovery Plan

Backup Plan

Recovery Plan

Emergency Plan
 The emergency plan specifies the actions to be

undertaken immediately when a disaster occurs. Management must identify those situations that require the plan to be invoked - for example Major fire Major structural damage Terrorist attack.  The actions to be initiated can vary somewhat depending on the nature of the disaster that occurs. For example, some disasters require that all personnel leave the information systems facilities immediately; others require a few select personnel remain behind for a short period to sound alarms, shut down equipment.

Aspects of Emergency Plan


 The plan must show who is to be notified

immediately when the disaster occurs management, police or fire department.  The plan must show any actions to be undertaken, such as shutdown of equipment, removal of files, and termination of power.  Any evacuation procedures required must be specified.  Return procedures (e.g.. conditions that must be met before the site is considered safe) must be designated.

Backup Plan
 Backups must ensure

Type Frequency Procedures Location of backup resources Restoration site Personnel Priorities Time frame  Complex or straight forward backup plans

Backup Resources
Resource Personnel Hardware Facilities Documentation Supplies Data/Information Application software System software Nature of Backup Training and rotation of duties among information systems staff so they can take the place of others. Arrangements with another company for provision of staff. Outsourcing arrangements for hardware provision. Outsourcing arrangements for the provision of facilities. Inventory of documentation stored securely on site and off site. Inventory of critical supplies stored securely on site and off site with list of vendors who provide all supplies. Inventory of files stored securely on site and off site. Inventory of application software stored securely on site and off site. Inventory of system software stored securely on site and off site.

Backup Sites
 Cold site: If an organization can tolerate

some downtime, cold-site backup might be appropriate. A cold site has all the facilities needed to install a mainframe systemraised floors, air conditioning, power, communications lines, and so on. The mainframe is not present, however, and it must be provided by the organization wanting to use the cold site. An organization can establish its own cold site facility or enter into an agreement with another organization to provide a cold site facility.

 Hot site: If fast recovery is critical, an organization

might need hot-site backup. All hardware and operations facilities will be available at the hot site. In some cases, software, data, and supplies might also be stored there. Hot sites are expensive to maintain. They usually are shared with other organizations that have hot site needs.  Warm-site: A warm site provides an intermediate level of backup. It has all cold site facilities plus hardware that might be difficult to obtain or install. For example, a warm site might contain selected peripheral equipment plus a small mainframe with sufficient power to handle critical applications in the short run.

Reciprocal Agreement:Two or more organizations might agree to provide backup facilities to each other in the event of one suffering a disaster. This, backup option is relatively cheap, but each participant must maintain sufficient capacity to operate another's critical systems. Reciprocal agreements are often informal in nature.
1. 2. 3. 4. 5. 6. 7. How soon the site will be made available subsequent to a disaster. The number of organizations that will be allowed to use the site concurrently in the event of a disaster. The priority to be given to concurrent users of the site in the event of a common disaster. The period during which the site can be used. The conditions under which the site can be used. The facilities and services the site provider agrees to make available. What controls will be in place and working at the off-site facility.

Recovery Plan
 Recovery plans set out procedures to restore full

information systems capabilities. Recovery plans depend on the circumstances: disaster is global or localized Nature of the machine, Applications data to be recovered.  Recovery committee works out the specifics of the recovery to be undertaken.  The plan should specify: Responsibilities of the committee Provide guidelines on priorities to be followed.

Test Plan
 Identify deficiencies in the emergency, backup, or

recovery plans or in the preparedness of an organization and its personnel in the event of a disaster.  Periodically, test plans must be invoked; that is, a disaster must be simulated and information systems personnel required to follow backup and recovery procedures.  To facilitate testing, a phased approach can be adopted. First, the disaster recovery plan can be tested by desk checking and inspection and walkthroughs, much like the validation procedures adopted for programs. A disaster can be simulated at a convenient time for example, during a slow period in the day.

Business Continuity Plan


 BCP is the act of proactively working out a way to

prevent and manage the consequences of a disaster, limiting it to the extent that a business can afford. Business continuity planning determines how a company will keep functioning until its normal facilities are restored after a disruptive event. This encompasses how employees will be contacted, where they will go and how they will keep doing their jobs.  Business Continuity is the exercise of recovering from an availability interruption or disaster event in minutes instead of days. The chart below depicts the delta between disaster recovery and business continuity.

Traditional Disaster Recovery Planning


Restore Data from Backups Identify & Enter Lost Data

Periodic offsite

Periodic offsite

Resume Processing

Backup Time

Backup Minutes

Hours
Perform target takeover and resume processing

Days

Continuous mirroring of data to remote site

Business Continuity Planning

KPIs
 Recovery Point Objective (RPO) The pre-

incident point in time that data must be recovered to resume business transactions (acceptable transaction data loss).  Recovery Time Objective (RTO) The maximum elapsed time required to recover data and processing capability.  Each of these KPIs craft the meaning and levels of service that organizations must consider when accessing business impact.  Business Continuity describes the processes and procedures an organization puts in place to ensure that essential functions can continue during and after a disaster.

Business Impact Analysis


Business impact analysis is performed to determine the impacts associated with disruptions to specific functions or assets in a firm these include  operating impact  financial impact  legal or regulatory impact. For example, should billing, receivable, and collections business functions be crippled by inaccessibility of information, cash flow to the business will suffer. Additional risks are that lost customers will never return, the business credit rating may suffer, and significant costs may be incurred for hiring temporary help. Lost revenues, additional costs to recover, fines and penalties, overtime, application and hardware, lost good will, and delayed collection of funds could be the business impact of a disaster.

Risk Analysis
Risk analysis identifies important functions and assets that are critical to a firms operations, and then subsequently establishes the probability of a disruption to those functions and assets. Once the risk is established, objectives and strategies to eliminate avoidable risks and minimize impacts of unavoidable risks can be set. A list of critical business functions and assets should first be compiled and prioritized. Following this, determine the probability of specific threats to business functions and assets. For example, a certain type of failure may occur once in 10 years. From a risk analysis, a set objectives and strategies to prevent, mitigate, and recover from disruptive threats should be developed.

Disaster Recovery Plan


 Disaster recovery plan is an IT-focused plan

designed to restore operability of the target systems, applications, or computer facility at an alternate site after an emergency. A DRP addresses major site disruptions that require site relocation. The DRP applies to major, usually catastrophic, events that deny access to the normal facility for an extended period. Typically, Disaster Recovery Planning involves an analysis of business processes and continuity needs; it may also include a significant focus on disaster prevention.

Disaster Tolerance
Disaster tolerance defines an environments ability to withstand major disruptions to systems and related business processes. Disaster tolerance at various levels should be built into an environment and can take the form of hardware redundancy, high availability/clustering solutions, multiple data centers, eliminating single points of failure, and distance solutions.

Bare Metal Recovery


 A bare metal recovery describes the process of

restoring a complete system, including system and boot partitions, system settings, applications, and data to their original state at some point prior to a disaster.  High Availability describes a systems ability to continue processing and functioning for a certain period of time - normally a very high percentage of time, for example 99.999%. High availability can be implemented in IT infrastructure by reducing any single points-of-failure (SPOF), using redundant components. Similarly, clustering and coupling applications between two or more systems can provide a highly available computing environment.

You might also like