Monday, February 28, 2005
[Enterprise : Management]Dealing with large-scale disasters
By Mike Talon, TechRepublic
Friday, February 25 2005 8:23 PM
In any column dealing with Business Continuity Planning (BCP) and Disaster Recovery (DR), there will no doubt come a time when the discussion must turn to large-scale disasters.
There has been a great deal of press and awareness of man-made disasters, and lately there has been a true surge in coverage of natural disasters with hurricane after hurricane slamming into multiple cities again and again. Both types of disasters can and do cause massive loss of systems, even entire locations, not to mention the loss of life involved in the wake of these events. How will your organization handle this type of disaster?
No organization can claim readiness for large-scale disasters without addressing the trinity of BCP: Human Resources, Facilities Management, and Information Technology. This trio must work in concert to properly overcome a disaster's impact, so you will not be able to do this alone as an IT professional. It would seem that even with all three groups working together, you will still have an overwhelming task ahead of you, but if you break the tasks down into component parts, you can manage the event and maintain your business systems.
The first order of business is to get good information flowing in. In the wake of a major disaster--natural or man-made--you will no doubt find a wealth of information that you will need to sift through to verify what is real, versus what is either imagined or simply exaggerated. Case in point: After the initial shock of the power failures in the northeast United States in August 2003, many people were absolutely convinced it was a terrorist attack, when in fact it was simply a large-scale technology failure across several systems. Finding out what happened and what resources you still have available is a vital first step in the process of dealing with a disaster.
Your next priority is to get good information flowing out. Make sure everyone who needs to be in the loop during the initial recovery process is available, or that substitutes are brought in. It may sound easy on the surface, but remember that physical and mobile phone service may be interrupted, e-mail systems will probably be offline, and other communication systems may be acting erratically. Find the systems that are still working and get the word out as soon as possible.
Hopefully, you have already determined your Recovery Time Objectives (RTO) for your various systems before the disaster struck. If not, there is very little you can do but try to bring everything back up as soon as you can. If you do have RTO numbers, start working with the shortest recovery times and bring those systems up in alternate locations first, and leave all the other systems for later--no matter how much people start yelling at you to bring them up sooner.
At this point, you must concentrate your staff on the most important systems first, regardless of the apparent urgency that already panicked staffers may express to you regarding other systems that everyone agreed were less important prior to the actual disaster. Keep in mind that this may mean finding alternate data-center space and acquiring new hardware if you haven't already planned for these eventualities. This is where Facilities Management comes in to make sure you have a location to set all this up.
Finally, after all the urgent issues have been addressed, you can then begin to bring up other data-systems as time and equipment will allow. If you're in a smaller shop, HR, Facilities, and IT may all be the same person, making your job somewhat easier and harder at the same time, but all three groups must be brought into the equation.
Dealing with a large-scale disaster is something that everyone would prefer not to have to deal with. Recent events have proven that it is--unfortunately--an eventuality that no organization can afford to ignore.
Friday, February 25 2005 8:23 PM
In any column dealing with Business Continuity Planning (BCP) and Disaster Recovery (DR), there will no doubt come a time when the discussion must turn to large-scale disasters.
There has been a great deal of press and awareness of man-made disasters, and lately there has been a true surge in coverage of natural disasters with hurricane after hurricane slamming into multiple cities again and again. Both types of disasters can and do cause massive loss of systems, even entire locations, not to mention the loss of life involved in the wake of these events. How will your organization handle this type of disaster?
No organization can claim readiness for large-scale disasters without addressing the trinity of BCP: Human Resources, Facilities Management, and Information Technology. This trio must work in concert to properly overcome a disaster's impact, so you will not be able to do this alone as an IT professional. It would seem that even with all three groups working together, you will still have an overwhelming task ahead of you, but if you break the tasks down into component parts, you can manage the event and maintain your business systems.
The first order of business is to get good information flowing in. In the wake of a major disaster--natural or man-made--you will no doubt find a wealth of information that you will need to sift through to verify what is real, versus what is either imagined or simply exaggerated. Case in point: After the initial shock of the power failures in the northeast United States in August 2003, many people were absolutely convinced it was a terrorist attack, when in fact it was simply a large-scale technology failure across several systems. Finding out what happened and what resources you still have available is a vital first step in the process of dealing with a disaster.
Your next priority is to get good information flowing out. Make sure everyone who needs to be in the loop during the initial recovery process is available, or that substitutes are brought in. It may sound easy on the surface, but remember that physical and mobile phone service may be interrupted, e-mail systems will probably be offline, and other communication systems may be acting erratically. Find the systems that are still working and get the word out as soon as possible.
Hopefully, you have already determined your Recovery Time Objectives (RTO) for your various systems before the disaster struck. If not, there is very little you can do but try to bring everything back up as soon as you can. If you do have RTO numbers, start working with the shortest recovery times and bring those systems up in alternate locations first, and leave all the other systems for later--no matter how much people start yelling at you to bring them up sooner.
At this point, you must concentrate your staff on the most important systems first, regardless of the apparent urgency that already panicked staffers may express to you regarding other systems that everyone agreed were less important prior to the actual disaster. Keep in mind that this may mean finding alternate data-center space and acquiring new hardware if you haven't already planned for these eventualities. This is where Facilities Management comes in to make sure you have a location to set all this up.
Finally, after all the urgent issues have been addressed, you can then begin to bring up other data-systems as time and equipment will allow. If you're in a smaller shop, HR, Facilities, and IT may all be the same person, making your job somewhat easier and harder at the same time, but all three groups must be brought into the equation.
Dealing with a large-scale disaster is something that everyone would prefer not to have to deal with. Recent events have proven that it is--unfortunately--an eventuality that no organization can afford to ignore.