Home Blog Application Modernization Guide to Developing AWS Disaster Recovery Plan

Guide to Developing AWS Disaster Recovery Plan

Want to implement a disaster recovery plan, but don’t know where to start? This blog will be useful for any company that owns the IT infrastructure. It doesn’t matter which storage method you prefer to use: either on-premise or cloud has to be protected.

by Ostap Demkovych

April 30, 2020; 13 min read

Application Modernization AWS DevOps

Guide to Developing AWS Disaster Recovery Plan

Table of Contents

Did you know that 75% of SMEs still operate without a disaster recovery plan in place? A lot of organizations today still stick with an on-site disaster recovery policy usually involving housing all critical data at an off-site location or two, often in a reciprocal arrangement with another organization. But things have changed, and there is now a much more efficient and cost-effective solution – cloud disaster recovery. In particular, using Amazon Web Services (AWS) as a disaster recovery site.

But simply being in the cloud today and assuming all will be well is not the solution. You need a comprehensive plan so that you know what your total infrastructure is, where it is, what is critical, what should be moved to the cloud, and what specific AWS cloud services you need in order to resume operations as quickly as possible.

Learn how to create a comprehensive disaster recovery plan on AWS to mitigate downtime and ensure business continuity.

Table of Contents

What is the Disaster Recovery Plan and It Matters

AWS disaster recovery plan consists of specific instructions on how to return the servers flowing in case if some issues disrupt the operating system. One might be skeptical about planning the disaster recovery until something happens. The outage can be caused by a whole bunch of reasons: climate change, cyber attack, ransomware, even the bad cooling system. For this reason, it’s better to have a precise plan of how to survive the disaster without losing critical data, your customers and money.

Here are 5 benefits you can gain with a recovery plan:

Secured business continuity.
Essential data protection.
A fast server recovery.
Effective cost-saving.
Good reputation among customers.

AWS Disaster Recovery Strategies

Developing a recovery plan can be a time-consuming task, but it’s worth it. RTO (Recovery Time Objective) and the RPO (Recovery Point Objective) are the necessary elements of disaster recovery planning. RTO is the time your company will be offline after the outage happens. RPO is the time worth of data loss. If two hours of data loss makes you panic, then it would be wise to pay more for maintenance and regular backups. On the other hand, if the downtime of 24 hours isn’t critical, then consider choosing a cheaper option.

Business impact analysis predicts possible outcomes of a disaster. It allows us to see how work disruptions can affect the data, reputation and finances. Such a report shows the most critical places that must be protected in the first place.

Based on individually defined RTO and RPO, it’s easier to understand how complex the DR support might be. Amazon Web Services disaster recovery plan can be chosen between four options: backup and restore, pilot light, warm standby and hot standby.

Backup and Restore

Tape libraries or on-premises storage are traditional storage solutions that are not effective anymore. The International Data Corporation has estimated that by the year 2025, the amount of data will expand to 175 zettabytes. Cloud infrastructure is a more durable, cost-efficient and easy-to-use solution. Amazon S3 and AWS Storage Gateway are the reliable tools for this type of protection.

Pilot Light

Pilot light is another kind of data loss prevention. Critical data is constantly replicated to the cloud, whereas the standby system is turned on. Some parts of IT infrastructure are turned off and used only while testing. The standby system is like a small flame, ready to heat a whole structure when the recovery begins.

Warm Standby

The pilot light version allows the most critical parts of the system to be always on. Warm standby makes the copy of the whole system running in the cloud and reduces the RTO and RPO to a matter of minutes.

Hot Standby

This DR solution is the most reliable one. It uses the multi-site, where the one-to-one infrastructure copy is running. The process is happening in another region. For example, if your source servers shut down because of high outside temperatures in Miami, the copy will successfully operate in another place.

Step-By-Step Guide to Developing a Disaster Recovery Plan with AWS

If you follow an organized plan for disaster recovery development and then actually implement that plan, you should be able to ensure business continuity no matter what. Below is your step-by-step guide for developing a disaster recovery plan with AWS.

Step 1. Identify and Describe All of Your Infrastructure

Remember this: your infrastructure includes all of your hardware, software, networks, storage, your current serverless solutions, and your production and sandbox environments. If you already have cloud infrastructure in AWS, then be certain to use the AWS Tag Editor to find all of the resources you have there and properly map them.
If you do not, then you have some additional work cut out for you. Take your time to identify all of your infrastructures to be certain you have left nothing out.

Step 2. Involve the Entire Development Team

Whether you have an in-house team or are using IT development services on a contractual basis, those responsible for that development must be consulted. You have to know what dependencies exist among your infrastructure elements. And don’t forget third party resources you may be using (e.g. Google Maps API) as well. You can use AWS Status Page and your Personal Health Dashboard to receive additional insights about planned maintenance/updates. Later, this process can be automated using AWS Health API.

Establishing dependencies and mapping infrastructure is a time-consuming process and one that must occur over time. It’s a bit like a grocery list – you keep adding to it as new items come to mind.

Step 3. Identify the Importance of Each Infrastructure Element

A database that houses all of your current customer information and interaction history is obviously more important than one housing your “dead” customer information. These elements should be prioritized because that prioritization will matter when you determine the elements to be recovered and the given time frame in which you want to recover them. Those that can be recovered more slowly, for example, will cost you less.

It’s best to ask yourself the question, “What would happen if ________ (name of an element) went down?” Listing the consequences will help you prioritize their level of importance.

Step 4. Discuss RTO and RPO with Stakeholders

People in your customer service department have needs related to:

RTO (recovery time objective) – when they can expect to be back up and running)
RPO (recovery point objective) – the point in the past where they will be taken once recovery occurs.

Your HR people may have a different view, as will the warehouse management staff. Ideally, RTO will be equally as rapid for all, but that is probably not realistic. These, too, must be prioritized. The more rapid the RTO needs, for example, the more expensive the recovery will be. Striking the optimal balance between the “wants” and “needs” will be essential. In our previous post, we have shared several effective cost optimization strategies for AWS that can be applied towards DR as well.

Step 5. Cost Optimization in AWS Disaster Recovery

Implementing a disaster recovery plan on AWS doesn’t have to break the bank. With the right strategies, you can ensure high availability and data protection while keeping costs under control. Here are some tips for cost optimization in AWS disaster recovery:

Right Sizing

For instance, Right Sizing involves ensuring that you are using the most cost-effective instances to meet your needs. AWS provides a variety of instance types optimized to fit different use cases.

Reserved Instances

Reserved Instances can be used for predictable workloads, offering significant discounts compared to on-demand instance pricing.

Auto Scaling

With Auto Scaling, you can ensure that you have the right number of Amazon EC2 instances available to handle the load for your application. This not only improves performance and availability but also reduces cost by scaling down automatically when demand is low.

Lifecycle Policies

Implementing lifecycle policies can help move infrequently accessed data to cheaper storage classes to save costs.
By implementing these strategies, you can strike a balance between ensuring business continuity and managing costs in your AWS disaster recovery plan.

Data Transfer Optimization

Data transfer costs can add up quickly in a disaster recovery setup. Optimize your data transfer by using services like AWS Direct Connect and data transfer methods like compression and deduplication.

Storage Optimization

Choose the right storage class for your data. AWS offers several storage classes in S3, each with its own pricing model. Use lifecycle policies to automatically move data to cheaper storage classes when it’s not accessed frequently.

Automate Cleanup

Automate the cleanup of old backups and snapshots that are no longer needed. This can be done using lifecycle policies in S3 and automated scripts for EBS snapshots.
Remember, cost optimization is a continuous process. Regularly review and adjust your AWS resources and configurations to ensure you’re getting the most value for your money.

Step 6. Use Your Accumulated Information to Create Your Plan

Every organization is unique, as will be the disaster recovery plan for its infrastructure. It could be as simple as weekly/monthly backups or as complex as maintaining an exact and immediate copy of your infrastructure in another cloud – something a bank will do, for example.

Here is where you make use of all of the cloud-based tools available to you. Your task is to determine which of these tools will satisfy your DR needs. AWS provides flexible solutions that can be customized based on your needs. For instance, you can leverage RDS to create multi-AZ deployments and automate backups with built-in rotation to create a basic DR plan. This option is best for the non-tech savvy business owners. As well, you can browse AWS Marketplace for custom disaster recovery as a service solution, pre-made by others such as remote backup tools or automated environment duplication. No matter which option you choose, there are some universal factors worth considering prior to adoption:

Downtime Cost vs. Backup/Recovery Cost
There will be choices to make between the projected income loss during downtime and the cost of backup and restoration of your data. If you can withstand a longer downtime period, you can select a slower, less expensive option, but if you cannot, then you will want something like the more expensive duplicate production environment.
How much data loss is acceptable?
Do you need ongoing and immediate backup or can you afford to lose several hours’ worth?
Which specific backup options are best suited to your circumstances?
Within EC2, for example, you have choices between Amazon Machine Images (AMI) or EBS snapshots. In general, instance store-based AMI’s are slower and less flexible and cost more than EBS snapshots. But there are also strengths that come with that additional cost.
How will you automate your backup and how should you choose an additional region for copies of those backups?
There are a wealth of AWS tools for automating your backups so that you can rest easy knowing that the process goes on without your intervention. And you can literally select an additional region for backup half a world away. You will want to make use of AWS disaster recovery management tools, many of which can be had with a few clicks of your cloud provider console. There are also custom solutions available via the AWS marketplace, including options ranging from “pilot light” to “hot standby.”

Step 7. Establish the In-House Communication Network

What happens in-house when a failure occurs? Who is responsible for monitoring server parameters and alerts? AWS CloudWatch will allow monitoring through CloudWatch Events and Lambda. And there’s plenty of additional IT infrastructure monitoring tools worth considering.

But the big question is: who will be responsible for round-the-clock monitoring and execution of your DR plan when things go awry all of a sudden? Again, you have several choices:

Re-assign developers from your in-house team to monitor and fine-tune your infrastructure, and run DR scenarios. However, this means that you’ll be taking away people from other projects, and will have to pay them overtime in case things get messy at the weekends for instance.
Hire a DevOps support team, who will manage your IT support 24/7, report on new findings and continuously optimize your infrastructure performance. Typically, this option involves fewer costs for your business in the long run, as well as less stress on your internal teams.

Step 8. Testing and Re-Testing

If you don’t test your plan, you are asking for trouble. Setting up a DR is just part of the job. It’s way more important to ensure that those measures will work when a disaster strikes. Again, AWS has some neat tools for staging and testing your solutions. For instance, you can create a duplicate environment, where you can run stage real-world scenarios and see how your systems will perform. Such testing procedures should occur on a regular basis.

Disaster recovery is no longer an option. Just given the increase in natural disasters, not to mention breaches of systems, ransomware, etc., a disaster recovery plan is a “must” in this new physical and digital environment. All of this can seem pretty daunting to a small or mid-sized organization, especially when an in-house IT team is not on board. And yet, the need for a robust disaster recovery solution is there.

AWS provides a comprehensive set of solutions for disaster recovery, but understanding those solutions and choosing those that meet your needs and your budget constraints will require a great deal of study, discussion, and planning.

Just developing the plan can prove challenging, because those involved have to have enough understanding of cloud-based DR services, what is available, and how the myriad of solutions options can be pared down to fit an individual circumstance.

While AWS has an extensive tutorial system in place and an easy user interface, it cannot make decisions and choices for you.

Additional Tips on Disaster Recovery Planning

If you need more information, look through these three small tips. They will deepen your knowledge about disaster recovery planning.

Use Another Region as the Airbag

AWS storage partners are S3 and Dynamo DB services. They have gained popularity because of their durability and easy interfaces. They automatically replicate your data to other places within the region. However, natural disasters such as hurricanes can shut down the whole region. So, we recommend you to copy the data to multiple regions.

Know the Difference Between AWS Backup and Disaster Recovery

The best storage and backup solutions won’t save you unless you have a disaster recovery plan and know how to access and extract the data from the AWS cloud.

Turn to the Third-Party Developers

AWS has many reliable partners and tools for the app maintenance. Besides, there are free online technical courses. However, the professional planning of disaster recovery requires third-party help and assistance.

The modern world is full of dangers to your business. As you already know, the disaster can be caused by anything from a typo to a destroying hurricane. It’s important to spend a lot of time discussing your time and point objectives, choosing the right storage solution and developing a reliable AWS disaster recovery plan.

Author

Ostap Demkovych

Delivery Manager, Romexsoft

FAQ

What are some effective AWS DR strategies for my disaster recovery plan?

AWS offers several disaster recovery strategies that you can incorporate into your disaster recovery plan. These include the Pilot Light scenario, where a minimal version of your environment is always running in the cloud, the Warm Standby scenario, which involves a scaled-down version of your production infrastructure always running in the cloud, and the Multi-site Deployment scenario, where business-critical data and core infrastructure components are replicated across several on-premises or cloud locations.

How can I optimize costs in my AWS disaster recovery plan?

Cost optimization in your AWS disaster recovery plan can be achieved through several strategies. Right Sizing ensures you're using the most cost-effective instances for your needs. Reserved Instances offer significant discounts for predictable workloads. Auto Scaling adjusts the number of Amazon EC2 instances based on the load for your application, improving performance and reducing costs. Implementing lifecycle policies can move infrequently accessed data to cheaper storage classes, saving costs.

What should I include in my AWS disaster recovery plan template?

An effective AWS disaster recovery plan template should include a detailed description of your infrastructure, the involvement of your entire development team, the importance of each infrastructure element, discussions on RTO and RPO with stakeholders, and strategies for cost optimization. Regular testing of your plan is crucial to ensure it works as expected when disaster strikes. Automation, monitoring, and alerting are also key components of a robust AWS disaster recovery plan.

What are some best practices for disaster recovery for AWS?

Best practices for disaster recovery for AWS include regular testing of your disaster recovery plan, monitoring the state of your AWS resources, setting up alerts for anomalies or potential issues, and automation to reduce manual intervention and human error. Keeping detailed documentation of your disaster recovery plan and procedures is also crucial. Remember, disaster recovery is not a set-and-forget task. Regularly review and update your plan to accommodate changes in your business and technology environment.