Home Success Stories Building Disaster Recovery Solution on AWS for SaaS

Building Disaster Recovery Solution on AWS for SaaS

Unveil how we built a fully functional pilot light DR environment for protecting the client's SaaS infrastructure from the downtime.

DevOps Services
HealthTech
USA

Building Disaster Recovery Solution on AWS for SaaS

Executive Summary

Creating a Disaster Recovery Infrastructure

Our Customer

Pragma IT therapyBOSS is a comprehensive web and mobile SaaS platform for agencies and clinicians that allows its users manage administrative and clinical aspects of home health therapy (Early Intervention, Physical Therapy, Speech Therapy, Skilled Nursing etc). It helps healthcare providers be more efficient and compliant, saves time, cuts costs and streamlines operations of treating patients at home.

The Obstacles They Faced

Client’s main workloads are running on-premises, but in order to meet the US healthcare compliance requirements, reduce restore time, recovery time objective (RTO) and recovery point objective (RPO), minimize the interruption of critical processes and safeguard business operations they needed to build a trustworthy and sustainable disaster recovery (DR) infrastructure.

How We Helped

Romexsoft successfully did professional work on therapyBOSS on-premises environments and software components and built a fully functional pilot light DR environment for protecting the client’s SaaS infrastructure from prolonged downtime and thereby for safeguarding the vital business operations.

The Challenges

Balancing Fast Recovery with Cost Efficiency

The challenge was to find the best possible option in building the DR solution for the SaaS from the perspective of the right balance between the fastest feasible restoration of the platform and the cost-effectiveness of disaster recovery infrastructure itself.

For instance, negative events that could happen with on-premise environments could be a hardware or software failure, a network or power outage, physical damage caused by fire or flooding, human error or some other kind of significant disaster which causes a negative impact on the business continuity.

The Solution

Cloud-Based Disaster Recovery Model

How the application is built

The TherapyBOSS application is written in Java and has microservices based containerized architecture. Communication between the microservices is implemented through the REST API and event driven approaches. Apache Kafka is used as a distributed event streaming platform. Galera Cluster for MySQL and MongoDB are used as data storage solutions.

How the DR infrastructure is designed

After several workshops with the customer Romexsoft suggested building pilot light DR infrastructure in the US East (Ohio) AWS region far from on-premises data-center. This decision was driven to meet client’s specific RTO, RPO and TCO requirements for their application as well as to enable faster disaster recovery of the critical IT systems from any event that harms the Pragma IT business.

The pilot light disaster recovery approach was delivered by configuring and running the most critical core elements of the customer system in AWS. When the time for recovery comes, AWS infrastructure rapidly provisions a full-scale production environment around the critical one.

Ensuring data relevance and synchronization
To provide constant data relevance for the solution, one of the Galera’s read replicas always runs on AWS EC2 instance and remains synchronized with the main cluster in the data center. Similar approach is designed for the MongoDB cluster. Additional Galera and MongoDB replicas will be provisioned on EC2 instances and synchronized as well.

Data synchronization between on-premises and AWS is accomplished through AWS Site-to-Site VPN. All other AWS services such as applications running in Fargate, AWS MKS, Jenkins server, and Bastion host run in the idle mode. In the moment of disaster, idle part of the AWS infrastructure will be provisioned using the infrastructure as code (IaC) approach with Terraform.

How the DR infrastructure is maintained

We have agreed with the customer to perform disaster recovery exercises for the staging environment on a monthly basis. This activity ensures:

confidence that DR infrastructure always functions properly
integrity of DR environment evolution in accordance with the app’s development
tracking and compliance of determined time range for the restoration of replicas of the on-premises infrastructure

Disaster Recovery Solution for Healthcare SaaS Architecture Diagram

AWS Architecture Diagram: Disaster Recovery Solution for Healthcare SaaS.

Amazon Web Services Utilized

Elastic Compute Cloud (EC2)

Fargate

Simple Storage Service (S3)

Managed Streaming for Apache Kafka (MSK)

Site-to-Site VPN

Verified by AWS

This case study is validated by AWS. Experts and professional auditors from AWS reviewed this case study and verified that we, Romexsoft, have built a functional infrastructure and efficient cloud solution.

It showcases the value that Romexsoft, being a certified AWS Advanced Tier Services Partner, delivers cloud solutions according to AWS standards and best practices.

Check out Romexsoft’s profile at AWS Partner Network.

The Results

Minimized Downtime with Compliant DR Architecture

In general, AWS based DR infrastructure designed and developed by Romexsoft holds the critical core of the customer’s SaaS around which all other infrastructure pieces can be quickly provisioned to restore the complete system when the time comes.

Within the implemented solution we also achieved:

meeting the US healthcare compliance requirements
minimization of interruptions of critical processes
safeguard of vital business operations
cost effectiveness for the whole DR infrastructure
warranty of restore systems and services in a short period of time
(about one hour recovery time objective (RTO) and seconds recovery point objective (RPO).

Why Romexsoft

Expert in Pilot Light Disaster Recovery

Romexsoft is an AWS-certified Consulting Partner, trusted Software Development Company and Managed Service Provider, founded in 2004. We help customer-centric companies build, run, and optimize their cloud systems on AWS with creative, elegant, and cost-efficient solutions.

Our key values

Delivery of quality solutions
Customer satisfaction
Long-term partnership

We have successfully delivered 100+ projects and have a proven track record in FinTech, HealthCare, AdTech, and Media industries.

Romexsoft possesses a 5-star rating on Clutch due to its strong expertise, responsiveness, and commitment. 60% of our clients have been working with us for over 4 years.

Disaster Recovery Solution on AWS FAQ

How does an aws pilot light strategy ensure rapid recovery in disaster scenarios?

An aws pilot light strategy involves maintaining a minimal version of a system, with core components always running. This ensures that in the event of a disaster, the system can be quickly scaled up to become fully operational, using the most recent data. By having this core system always on standby, recovery times are significantly reduced compared to traditional methods, ensuring business continuity with minimal disruption.

What are the key benefits of using aws disaster recovery solutions over traditional DR methods?

AWS disaster recovery solutions offer several advantages over traditional DR methods. They provide flexibility in terms of scaling, allowing for cost-effective solutions that can be tailored to specific business needs. AWS DR solutions also ensure high availability, with multiple regions and zones to choose from, ensuring data integrity and availability even in the event of regional outages. Additionally, the pay-as-you-go model of AWS allows for cost savings, as businesses only pay for the resources they use.

How does integrating Infrastructure-as-Code (IaC) tools, like Terraform, optimize the DR process in AWS environments?

Integrating Infrastructure-as-Code tools, such as Terraform, into the DR process allows for the automated and consistent provisioning of AWS resources. This ensures that the DR environment is always in sync with the production environment, minimizing potential discrepancies during recovery. IaC tools also allow for version control, ensuring that any changes to the infrastructure are tracked and can be rolled back if necessary. This level of automation and consistency ensures a smoother and more reliable DR process.

In the context of a pilot light aws strategy, why is continuous data synchronization pivotal for successful disaster recovery?

Continuous data synchronization is crucial in a pilot light aws strategy as it ensures that the standby environment is always updated with the most recent data from the primary system. This means that in the event of a disaster, the recovery process will restore the most up-to-date version of the system, minimizing data loss and ensuring business continuity. Continuous synchronization also reduces the risk of discrepancies between the primary and DR environments, ensuring a smoother recovery process.