Step-By-Step Guide to Developing a Disaster Recovery Plan with AWS

    Want to implement a Disaster Recovery Plan for your solution, but don’t know where to start?
    Here we outlined 7 main steps that will help you implement the Disaster Recovery Plan on AWS successfully:
    1. Identify & describe all of your infrastructure;
      2. Involve the entire development team;
        3. Identify the importance of each infrastructure element;
          4. Discuss RTO and RPO with stakeholders;
            5. Use your accumulated information to create the plan;
              6. Establish the in-house communication network;
                7. Testing & re-testing.
                This blog will help you learn how to create an efficient and cost-effective disaster recovery plan.

Did you know that 75% of SMEs still operate without a disaster recovery plan in place?

There could be a valid explanation for this fact though. A lot of organizations today still stick with an on-site disaster recovery policy usually involved housing all critical data at an off-site location or two, often in a reciprocal arrangement with another organization. But things have changed, and there is now a much more efficient and cost-effective solution – cloud disaster recovery. In particular, using Amazon Web Services (AWS) as a disaster recovery site.

But simply being in the cloud today and assuming all will be well is not the solution. You need a comprehensive plan so that you know what your total infrastructure is, where it is, what is critical, what should be moved to the cloud, and what specific AWS cloud services you need in order to resume operations as quickly as possible.

If you follow an organized plan for disaster recovery development and then actually implement that plan, you should be able to ensure business continuity no matter what. Below is your step-by-step guide for developing a disaster recovery plan with AWS.

Step 1: Identify and Describe All of Your Infrastructure

Remember this: your infrastructure includes all of your hardware, software, networks, storage, your current serverless solutions, and your production and sandbox environments. If you already have cloud infrastructure in AWS, then be certain to use the AWS Tag Editor to find all of the resources you have there and properly map them.

If you do not, then you have some additional work cut out for you. Take your time to identify all of your infrastructures to be certain you have left nothing out.

Step 2: Involve the Entire Development Team

Whether you have an in-house team or are using IT development services on a contractual basis, those responsible for that development must be consulted. You have to know what dependencies exist among your infrastructure elements. And don’t forget third party resources you may be using (e.g. Google Maps API) as well. You can use AWS Status Page and your Personal Health Dashboard to receive additional insights about planned maintenances/updates. Later, this process can be automated using AWS Health API.

Establishing dependencies and mapping infrastructure is a time-consuming process and one that must occur over time. It’s a bit like a grocery list – you keep adding to it as new items come to mind.

Step 3: Identify the Importance of Each Infrastructure Element

A database that houses all of your current customer information and interaction history is obviously more important than one housing your “dead” customer information. These elements should be prioritized because that prioritization will matter when you determine the elements to be recovered and the given time frame in which you want to recover them. Those that can be recovered more slowly, for example, will cost you less.

It’s best to ask yourself the question, “What would happen if ________ (name of an element) went down?” Listing the consequences will help you prioritize their level of importance.

Step 4: Discuss RTO and RPO with Stakeholders

People in your customer service department have needs related to:

  • RTO (recovery time objective) – when they can expect to be back up and running)
  • RPO (recovery point objective) – the point in the past where they will be taken once recovery occurs.

Your HR people will have a different view, as will the warehouse management staff. Ideally, RTO will be equally as rapid for all, but that is probably not realistic. These, too, must be prioritized. The more rapid the RTO need, for example, the more expensive the recovery will be. Striking the optimal balance between the “wants” and “needs” will be essential. In our previous post, we have shared several effective cost optimization strategies for AWS that can be applied towards DR as well.

Step 5: Use Your Accumulated Information to Create Your Plan

Every organization is unique, as will be the disaster recovery plan for its infrastructure. It could be as simple as weekly/monthly backups or as complex as maintaining an exact and immediate copy of your infrastructure in another cloud – something a bank will do, for example.

Here is where you make use of all of the cloud-based tools available to you. Your task is to determine which of these tools will satisfy your DR needs. AWS provides flexible solutions that can be customized based on your needs. For instance, you can leverage RDS to create multi-AZ deployments and automate backups with built-in rotation to create a basic DR plan. This option is best for the non-tech savvy business owners. As well, you can browse AWS Marketplace for custom disaster recovery as a service solution, pre-made by others such as remote backup tools or automated environment duplication. No matter which option you choose, there are some universal factors worth considering prior to adoption:

Downtime Cost vs. Backup/Recovery Cost: There will be choices to make between the projected income loss during downtime and the cost of backup and restoration of your data. If you can withstand a longer downtime period, you can select a slower, less expensive option, but if you cannot, then you will want something like the more expensive duplicate production environment.

How much data loss is acceptable? Do you need ongoing and immediate backup or can you afford to lose several hours’ worth?

Which specific backup options are best suited to your circumstances? Within EC2, for example, you have choices between Amazon Machine Images (AMI) or EBS snapshots. In general, instance store-based AMI’s are slower and less flexible and cost more than EBS snapshots. But there are also strengths that come with that additional cost.

How will you automate your backup and how should you choose an additional region for copies of those backups? There are a wealth of AWS tools for automating your backups so that you can rest easy knowing that the process goes on without your intervention. And you can literally select an additional region for backup half a world away. You will want to make use of AWS disaster recovery management tools, many of which can be had with a few clicks of your cloud provider console. There are also custom solutions available via the AWS marketplace, including options ranging from “pilot light” to “hot standby.”

Step 6: Establish the In-House Communication Network

What happens in-house when a failure occurs? Who is responsible for monitoring server parameters and alerts? AWS CloudWatch will allow monitoring through CloudWatch Events and Lambda. And there’s plenty of additional IT infrastructure monitoring tools worth considering.

But the big question is: who will be responsible for round-the-clock monitoring and execution of your DR plan when things go awry all of a sudden? Again, you have several choices:

  • Re-assign developers from your in-house team to monitor and fine-tune your infrastructure, and run DR scenarios. However, this means that you’ll be taking away people from other projects, and will have to pay them overtime in case things get messy at the weekends for instance.
  • Hire a DevOps support team, who will manage your IT support 24/7, report on new findings and continuously optimize your infrastructure performance. Typically, this option involves fewer costs for your business in the long run, as well as less stress on your internal teams.

Step 7: Testing and Re-Testing

If you don’t test your plan, you are asking for trouble. Setting up a DR is just part of the job. It’s way more important to ensure that those measures will work when a disaster strikes. Again, AWS has some neat tools for staging and testing your solutions. For instance, you can create a duplicate environment, where you can run stage real-world scenarios and see how your systems will perform. Such testing procedures should occur on a regular basis.

Disaster Recovery is No longer an Option

Just given the increase in natural disasters, not to mention breaches of systems, ransomware, etc., a disaster recovery plan is a “must” in this new physical and digital environment. All of this can seem pretty daunting to a small or mid-sized organization, especially when an in-house IT team is not on board. And yet, the need for a robust disaster recovery solution is there.

AWS provides a comprehensive set of solutions for disaster recovery, but understanding those solutions and choosing those that meet your needs and your budget constraints will require a great deal of study, discussion, and planning.

Just developing the plan can prove challenging, because those involved have to have enough understanding of cloud-based DR services, what is available, and how the myriad of solutions options can be pared down to fit an individual circumstance.

While AWS has an extensive tutorial system in place and an easy user interface, it cannot make decisions and choices for you.

More read: What is AWS Disaster Recovery Planning?

As a small business, the first right choice may be to contract with a third-party to assist in the development of your disaster recovery plan, to recommend the best AWS DevOps Services and Solutions, and then to implement those solutions for you. That same third-party and manage your solutions as well, providing IT support 24/7, automated backup setup, alerts, and more.

Romexsoft has dedicated teams in place to provide full AWS disaster recovery services to SME’s. Get in touch today and let us evaluate your situation!

Contact us Today!

    Ostap Demkovych
    Ostap Demkovych Sr. Delivery Manager, Senior Application Architect at Romexsoft | AWS Certified Solutions Architect | Oracle Certified Associate, Java SE 8 Programmer | Responsible for Java development and Big Data Services.
    Share The Post