Cyber attacks, natural disasters, pandemic-related business outages – IT operations are exposed to increasing risks that are difficult to calculate. Regardless of size and industry, any company can be affected. To get back to business as fast as possible after an IT security incident, responsible parties need a complete emergency plan. We talked to Walter Bühner, Senior Consultant for IT Security at DGC, about what successful disaster recovery entails:
What is disaster recovery in cyber security?
In terms of IT security, disaster recovery refers to the ability of an organization or individual to respond to the aftermath of an incident and regain access to their IT infrastructure – with the goal of restoring systems to functionality.
Critical events can include cyber attacks, natural disasters such as the floods in the German Ahr Valley, or business disruptions such as those that occurred in the wake of the Covid 19 pandemic. Using the specific example of a ransomware attack, disaster recovery describes a company’s options for restoring the systems paralyzed by the malware as fast and securely as possible and making them functional so that business operations can be resumed.
For this to succeed, appropriate methods and measures should be combined in a Disaster Recovery Plan (DRP). This documented and structured disaster recovery plan is part of business continuity, which is intended to ensure the maintenance and continuation of business in the event of a catastrophe.
Disaster Recovery Plan: What should (IT) companies consider when setting it up?
A disaster recovery plan, also known as an IT disaster recovery strategy, contains concrete recommendations for action, responsibilities and contact information to enable companies to act quickly and correctly after an event. There are five key pillars to include:
1. composition of the disaster recovery team
Those who determine in advance which people in the company should be involved with disaster recovery are clearly at an advantage. This can minimize response times and damage to IT resources. The disaster recovery team ideally consists of various experts and decision-makers to ensure maximum efficiency. A crisis manager takes the leading role for coordination and problem solving. The business continuity expert is responsible for evaluating the disaster recovery plan: he or she reviews the selected measures to ensure that the decisions made meet the requirements from the business impact. The IT infrastructure team and the application team must also be involved: while some experts specialize in the recovery of infrastructure components such as servers or storage, others ensure that applications are restored and integrated.
2. risk assessment of individual IT assets and asset groups
What kind of hardware, software systems and information exist in the enterprise? What specific risks are these assets or asset groups exposed to – and what is the probability of occurrence of these hazard scenarios? As part of strategic IT risk management, responsible parties should obtain an overall view of all IT assets and examine the associated risks.
3. Identification of business-critical assets
The next step is to identify business-critical assets and the associated risks for operations. These vary depending on the industry and business model. Two real-world examples: For a large media company that publishes print media, the printing presses will be the business-critical assets that should also be given special attention as part of the disaster recovery plan. In case of an online tour operator or a car rental company, on the other hand, it is the booking systems for which a disaster recovery plan is needed.
4. Developing a backup strategy
After a data loss and data theft, backups are of priceless value, because based on an up-to-date and complete data backup, a large part of the business-critical data can be restored. Therefore, all data should be copied regularly and according to a defined backup strategy and stored in a secure external location. A modern backup solution is offered by the cloud, for example.
5. define the test and optimization cycle
Companies should define the test and optimization cycle of their disaster recovery. The first step is to determine the intervals at which the defined disaster recovery plan should be tested. In addition, the handling of concrete test results should be clarified. Ideally, the results are used to optimize the plan in order to be prepared for an emergency.
RTO vs. RPO: How companies can calculate their own RTO and define downtimes
An important part of disaster recovery is transparency: companies should know how much IT downtime and lost data volume they can manage without suffering irreparable damage. The Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are used to calculate it. The RTO value specifies the maximum period in which an infrastructure, a system or individual processes may fail. The RPO value, on the other hand, refers to the amount of data that may be lost within a period relevant to a company without causing significant damage. Both metrics are essential for DRP planning – for a successful determination, responsible parties should consider the following facts and differences:
Know intended use
The RPO is used for developing a backup strategy while considering the data loss that may occur during an event. The RTO, on the other hand, is used to plan for overall disaster recovery and helps identify a strategy here.
RTO values are focused on productivity and the impact of an event on customers of the company. Among other things, the values describe the time it takes for a system or application to be ready for use again and what impact an outage has on productivity. RPO values, on the other hand, deal only with the data loss that can occur – and what impact that data loss can have on customer relationships and operations.
RTO costs are higher than RPO costs because RTO looks at the entire IT infrastructure and RPO only looks at the data loss element.
RPO values are easier to automate because they only require good backup planning with reasonable intervals and a backup strategy for recovery. This is not or only partially possible with RTO, since all infrastructures are considered here.
RPOs can be calculated by observing data consumption and planning a backup system and interval. RTOs are more difficult to determine because many varied factors come together here, such as the day an incident occurs, time of day, network capacities, etc.
For a realistic calculation of both values, examined data should be divided into critical and non-critical data to determine the appropriate priority and target.
How low is the RTO (Recovery Time Objective) really?
The theoretically planned RTO specifies the maximum period of time that a system, an infrastructure or an application may be down after an incident. But will it really be possible to meet this target in reality? How many seconds, minutes and hours does it really take until one’s own IT infrastructure is operational again – perhaps longer than defined in the DRP? To compare the actual recovery time (RTA) with the previously defined RTO value and to approximate it, it is essential to conduct regular emergency trainings. This is the only way to realistically find out how effectively the individual DRP components are working.
How can a disaster recovery plan be tested and how does the DGC support this?
A disaster recovery plan, as briefly mentioned earlier, is best tested during emergency exercises that simulate an occurring event such as a ransomware attack. These exercises should be done several times a year, as well as the corresponding times tracked, to identify potential problems and appropriate actions. This is best done in trusting cooperation with an experienced service provider.
The IT security experts at DGC’s Cyber Defense Operation Center (CDOC) support companies in developing an Information Security Management System (ISMS) and disaster recovery plans and devise customized emergency exercises. During the (largely) unannounced exercises, the defined emergency procedure (incident response) is tested, and the disaster recovery plan is jointly reviewed. This enables the identification of possible optimization potential – for example, with regard to achieving the RTO award.
Are you facing the task of developing a reliable disaster recovery plan or do you need to optimize your IT security concept?
Our experts will be happy to advise you – contact us directly.