Step-By-Step Guide to Developing a Disaster Recovery Plan with AWS

    Want to implement a AWS Disaster Recovery Plan for your solution, but don’t know where to start?
    Here we outlined 7 main steps that will help you implement the Disaster Recovery Plan on AWS successfully:
    1. Identify & describe all of your infrastructure;
      2. Involve the entire development team;
        3. Identify the importance of each infrastructure element;
          4. Discuss RTO and RPO with stakeholders;
            5. Use your accumulated information to create the plan;
              6. Establish the in-house communication network;
                7. Testing & re-testing.
                This blog will help you learn how to create an efficient and cost-effective disaster recovery plan.

Did you know that 75% of SMEs still operate without a disaster recovery plan in place?

There could be a valid explanation for this fact though. A lot of organizations today still stick with an on-site disaster recovery policy usually involved housing all critical data at an off-site location or two, often in a reciprocal arrangement with another organization. But things have changed, and there is now a much more efficient and cost-effective solution – cloud disaster recovery. In particular, using Amazon Web Services (AWS) as a disaster recovery site.

But simply being in the cloud today and assuming all will be well is not the solution. You need a comprehensive plan so that you know what your total infrastructure is, where it is, what is critical, what should be moved to the cloud, and what specific AWS cloud services you need in order to resume operations as quickly as possible.

If you follow an organized plan for disaster recovery development and then actually implement that plan, you should be able to ensure business continuity no matter what. Below is your step-by-step guide for developing a disaster recovery plan with AWS.

Step 1: Identify and Describe All of Your Infrastructure

Remember this: your infrastructure includes all of your hardware, software, networks, storage, your current serverless solutions, and your production and sandbox environments. If you already have cloud infrastructure in AWS, then be certain to use the AWS Tag Editor to find all of the resources you have there and properly map them.

If you do not, then you have some additional work cut out for you. Take your time to identify all of your infrastructures to be certain you have left nothing out.

Step 2: Involve the Entire Development Team

Whether you have an in-house team or are using IT development services on a contractual basis, those responsible for that development must be consulted. You have to know what dependencies exist among your infrastructure elements. And don’t forget third party resources you may be using (e.g. Google Maps API) as well. You can use AWS Status Page and your Personal Health Dashboard to receive additional insights about planned maintenances/updates. Later, this process can be automated using AWS Health API.

Establishing dependencies and mapping infrastructure is a time-consuming process and one that must occur over time. It’s a bit like a grocery list – you keep adding to it as new items come to mind.

Step 3: Identify the Importance of Each Infrastructure Element

A database that houses all of your current customer information and interaction history is obviously more important than one housing your “dead” customer information. These elements should be prioritized because that prioritization will matter when you determine the elements to be recovered and the given time frame in which you want to recover them. Those that can be recovered more slowly, for example, will cost you less.

It’s best to ask yourself the question, “What would happen if ________ (name of an element) went down?” Listing the consequences will help you prioritize their level of importance.

Step 4: Discuss RTO and RPO with Stakeholders

People in your customer service department have needs related to:

  • RTO (recovery time objective) – when they can expect to be back up and running)
  • RPO (recovery point objective) – the point in the past where they will be taken once recovery occurs.

Your HR people will have a different view, as will the warehouse management staff. Ideally, RTO will be equally as rapid for all, but that is probably not realistic. These, too, must be prioritized. The more rapid the RTO need, for example, the more expensive the recovery will be. Striking the optimal balance between the “wants” and “needs” will be essential. In our previous post, we have shared several effective cost optimization strategies for AWS that can be applied towards DR as well.

Step 5: Cost Optimization in AWS Disaster Recovery

Implementing a disaster recovery plan on AWS doesn’t have to break the bank. With the right strategies, you can ensure high availability and data protection while keeping costs under control. Here are some tips for cost optimization in AWS disaster recovery:

Right Sizing

For instance, Right Sizing involves ensuring that you are using the most cost-effective instances to meet your needs. AWS provides a variety of instance types optimized to fit different use cases.

Reserved Instances

Reserved Instances can be used for predictable workloads, offering significant discounts compared to on-demand instance pricing.

Auto Scaling

With Auto Scaling, you can ensure that you have the right number of Amazon EC2 instances available to handle the load for your application. This not only improves performance and availability but also reduces cost by scaling down automatically when demand is low.

Lifecycle Policies

Implementing lifecycle policies can help move infrequently accessed data to cheaper storage classes to save costs.
By implementing these strategies, you can strike a balance between ensuring business continuity and managing costs in your AWS disaster recovery plan.

Data Transfer Optimization

Data transfer costs can add up quickly in a disaster recovery setup. Optimize your data transfer by using services like AWS Direct Connect and data transfer methods like compression and deduplication.

Storage Optimization

Choose the right storage class for your data. AWS offers several storage classes in S3, each with its own pricing model. Use lifecycle policies to automatically move data to cheaper storage classes when it’s not accessed frequently.

Automate Cleanup

Automate the cleanup of old backups and snapshots that are no longer needed. This can be done using lifecycle policies in S3 and automated scripts for EBS snapshots.

Remember, cost optimization is a continuous process. Regularly review and adjust your AWS resources and configurations to ensure you’re getting the most value for your money.

Step 6: Use Your Accumulated Information to Create Your Plan

Every organization is unique, as will be the disaster recovery plan for its infrastructure. It could be as simple as weekly/monthly backups or as complex as maintaining an exact and immediate copy of your infrastructure in another cloud – something a bank will do, for example.

Here is where you make use of all of the cloud-based tools available to you. Your task is to determine which of these tools will satisfy your DR needs. AWS provides flexible solutions that can be customized based on your needs. For instance, you can leverage RDS to create multi-AZ deployments and automate backups with built-in rotation to create a basic DR plan. This option is best for the non-tech savvy business owners. As well, you can browse AWS Marketplace for custom disaster recovery as a service solution, pre-made by others such as remote backup tools or automated environment duplication. No matter which option you choose, there are some universal factors worth considering prior to adoption:

Downtime Cost vs. Backup/Recovery Cost: There will be choices to make between the projected income loss during downtime and the cost of backup and restoration of your data. If you can withstand a longer downtime period, you can select a slower, less expensive option, but if you cannot, then you will want something like the more expensive duplicate production environment.

How much data loss is acceptable? Do you need ongoing and immediate backup or can you afford to lose several hours’ worth?

Which specific backup options are best suited to your circumstances? Within EC2, for example, you have choices between Amazon Machine Images (AMI) or EBS snapshots. In general, instance store-based AMI’s are slower and less flexible and cost more than EBS snapshots. But there are also strengths that come with that additional cost.

How will you automate your backup and how should you choose an additional region for copies of those backups? There are a wealth of AWS tools for automating your backups so that you can rest easy knowing that the process goes on without your intervention. And you can literally select an additional region for backup half a world away. You will want to make use of AWS disaster recovery management tools, many of which can be had with a few clicks of your cloud provider console. There are also custom solutions available via the AWS marketplace, including options ranging from “pilot light” to “hot standby.”

Step 7: Establish the In-House Communication Network

What happens in-house when a failure occurs? Who is responsible for monitoring server parameters and alerts? AWS CloudWatch will allow monitoring through CloudWatch Events and Lambda. And there’s plenty of additional IT infrastructure monitoring tools worth considering.

But the big question is: who will be responsible for round-the-clock monitoring and execution of your DR plan when things go awry all of a sudden? Again, you have several choices:

  • Re-assign developers from your in-house team to monitor and fine-tune your infrastructure, and run DR scenarios. However, this means that you’ll be taking away people from other projects, and will have to pay them overtime in case things get messy at the weekends for instance.
  • Hire a DevOps support team, who will manage your IT support 24/7, report on new findings and continuously optimize your infrastructure performance. Typically, this option involves fewer costs for your business in the long run, as well as less stress on your internal teams.

Step 8: Testing and Re-Testing

If you don’t test your plan, you are asking for trouble. Setting up a DR is just part of the job. It’s way more important to ensure that those measures will work when a disaster strikes. Again, AWS has some neat tools for staging and testing your solutions. For instance, you can create a duplicate environment, where you can run stage real-world scenarios and see how your systems will perform. Such testing procedures should occur on a regular basis.

Disaster Recovery is No longer an Option

Just given the increase in natural disasters, not to mention breaches of systems, ransomware, etc., a disaster recovery plan is a “must” in this new physical and digital environment. All of this can seem pretty daunting to a small or mid-sized organization, especially when an in-house IT team is not on board. And yet, the need for a robust disaster recovery solution is there.

AWS provides a comprehensive set of solutions for disaster recovery, but understanding those solutions and choosing those that meet your needs and your budget constraints will require a great deal of study, discussion, and planning.

Just developing the plan can prove challenging, because those involved have to have enough understanding of cloud-based DR services, what is available, and how the myriad of solutions options can be pared down to fit an individual circumstance.

While AWS has an extensive tutorial system in place and an easy user interface, it cannot make decisions and choices for you.

More read: What is AWS Disaster Recovery Planning?

As a small business, the first right choice may be to contract with a third-party to assist in the development of your disaster recovery plan, to recommend the best AWS DevOps Services and Solutions, and then to implement those solutions for you. That same third-party and manage your solutions as well, providing IT support 24/7, automated backup setup, alerts, and more.

Romexsoft has dedicated teams in place to provide full AWS disaster recovery services to SME’s. Get in touch today and let us evaluate your situation!

AWS Disaster Recovery FAQ

What are some effective AWS DR strategies for my disaster recovery plan?

AWS offers several disaster recovery strategies that you can incorporate into your disaster recovery plan. These include the Pilot Light scenario, where a minimal version of your environment is always running in the cloud, the Warm Standby scenario, which involves a scaled-down version of your production infrastructure always running in the cloud, and the Multi-site Deployment scenario, where business-critical data and core infrastructure components are replicated across several on-premises or cloud locations.

How can I optimize costs in my AWS disaster recovery plan?

Cost optimization in your AWS disaster recovery plan can be achieved through several strategies. Right Sizing ensures you're using the most cost-effective instances for your needs. Reserved Instances offer significant discounts for predictable workloads. Auto Scaling adjusts the number of Amazon EC2 instances based on the load for your application, improving performance and reducing costs. Implementing lifecycle policies can move infrequently accessed data to cheaper storage classes, saving costs.

What should I include in my AWS disaster recovery plan template?

An effective AWS disaster recovery plan template should include a detailed description of your infrastructure, the involvement of your entire development team, the importance of each infrastructure element, discussions on RTO and RPO with stakeholders, and strategies for cost optimization. Regular testing of your plan is crucial to ensure it works as expected when disaster strikes. Automation, monitoring, and alerting are also key components of a robust AWS disaster recovery plan.

What are some best practices for disaster recovery for AWS?

Best practices for disaster recovery for AWS include regular testing of your disaster recovery plan, monitoring the state of your AWS resources, setting up alerts for anomalies or potential issues, and automation to reduce manual intervention and human error. Keeping detailed documentation of your disaster recovery plan and procedures is also crucial. Remember, disaster recovery is not a set-and-forget task. Regularly review and update your plan to accommodate changes in your business and technology environment.

Ostap Demkovych
Ostap Demkovych Sr. Delivery Manager, Senior Application Architect at Romexsoft | AWS Certified Solutions Architect | Oracle Certified Associate, Java SE 8 Programmer | Responsible for Java development and Big Data Services.
Share The Post