Building Disaster Recovery Solution on AWS for SaaS

Unveil how we built a fully functional pilot light DR environment for protecting the client's SaaS infrastructure from the downtime.

  • DevOps Services
  • HealthTech
  • USA

Executive Summary

Our Customer

Pragma IT therapyBOSS is a comprehensive web and mobile SaaS platform for agencies and clinicians that allows its users manage administrative and clinical aspects of home health therapy (Early Intervention, Physical Therapy, Speech Therapy, Skilled Nursing etc). It helps healthcare providers be more efficient and compliant, saves time, cuts costs and streamlines operations of treating patients at home.

The Obstacles They Faced

Client’s main workloads are running on-premises, but in order to meet the US healthcare compliance requirements, reduce restore time, recovery time objective (RTO) and recovery point objective (RPO), minimize the interruption of critical processes and safeguard business operations they needed to build a trustworthy and sustainable disaster recovery (DR) infrastructure.

How We Helped

Romexsoft successfully did professional work on therapyBOSS on-premises environments and software components and built a fully functional pilot light DR environment for protecting the client’s SaaS infrastructure from prolonged downtime and thereby for safeguarding the vital business operations.

The Challenges

The challenge was to find the best possible option in building the DR solution for the SaaS from the perspective of the right balance between the fastest feasible restoration of the platform and the cost-effectiveness of disaster recovery infrastructure itself.

For instance, negative events that could happen with on-premise environments could be a hardware or software failure, a network or power outage, physical damage caused by fire or flooding, human error or some other kind of significant disaster which causes a negative impact on the business continuity.

The Solution

How the application is built

The TherapyBOSS application is written in Java and has microservices based containerized architecture. Communication between the microservices is implemented through the REST API and event driven approaches. Apache Kafka is used as a distributed event streaming platform. Galera Cluster for MySQL and MongoDB are used as data storage solutions.

How the DR infrastructure is designed

After several workshops with the customer Romexsoft suggested building pilot light DR infrastructure in the US East (Ohio) AWS region far from on-premises data-center. This decision was driven to meet client’s specific RTO, RPO and TCO requirements for their application as well as to enable faster disaster recovery of the critical IT systems from any event that harms the Pragma IT business.

The pilot light disaster recovery approach was delivered by configuring and running the most critical core elements of the customer system in AWS. When the time for recovery comes, AWS infrastructure rapidly provisions a full-scale production environment around the critical one.

Ensuring data relevance and synchronization
To provide constant data relevance for the solution, one of the Galera’s read replicas always runs on AWS EC2 instance and remains synchronized with the main cluster in the data center. Similar approach is designed for the MongoDB cluster. Additional Galera and MongoDB replicas will be provisioned on EC2 instances and synchronized as well.

Data synchronization between on-premises and AWS is accomplished through AWS Site-to-Site VPN. All other AWS services such as applications running in Fargate, AWS MKS, Jenkins server, and Bastion host run in the idle mode. In the moment of disaster, idle part of the AWS infrastructure will be provisioned using the infrastructure as code (IaC) approach with Terraform.

How the DR infrastructure is maintained

We have agreed with the customer to perform disaster recovery exercises for the staging environment on a monthly basis. This activity ensures:

  • confidence that DR infrastructure always functions properly
  • integrity of DR environment evolution in accordance with the app’s development
  • tracking and compliance of determined time range for the restoration of replicas of the on-premises infrastructure

Disaster Recovery Solution for Healthcare SaaS Architecture Diagram

Disaster Recovery Solution on AWS for Healthcare SaaS

AWS Architecture Diagram: Disaster Recovery Solution for Healthcare SaaS.

Amazon Web Services Utilized

Amazon EC2
Elastic Compute Cloud (EC2)
AWS Fargate icon
Fargate
Amazon Simple Storage Service icon
Simple Storage Service (S3)
Amazon Managed Streaming for Apache Kafka icon
Managed Streaming for Apache Kafka (MSK)
AWS Site-to-Site VPN icon
Site-to-Site VPN

Verified by AWS

This case study is validated by AWS. Experts and professional auditors from AWS reviewed this case study and verified that we, Romexsoft, have built a functional infrastructure and efficient cloud solution.

It showcases the value that Romexsoft, being a certified AWS Advanced Tier Services Partner, delivers cloud solutions according to AWS standards and best practices.

Check out Romexsoft’s profile at AWS Partner Network.

The results

What We Achieved Together

In general, AWS based DR infrastructure designed and developed by Romexsoft holds the critical core of the customer’s SaaS around which all other infrastructure pieces can be quickly provisioned to restore the complete system when the time comes.

Within the implemented solution we also achieved:

  • meeting the US healthcare compliance requirements
  • minimization of interruptions of critical processes
  • safeguard of vital business operations
  • cost effectiveness for the whole DR infrastructure
  • warranty of restore systems and services in a short period of time
    (about one hour recovery time objective (RTO) and seconds recovery point objective (RPO).

Why Romexsoft

Romexsoft is an AWS-certified Consulting Partner, trusted Software Development Company and Managed Service Provider, founded in 2004. We help customer-centric companies build, run, and optimize their cloud systems on AWS with creative, elegant, and cost-efficient solutions.

Our key values

  • Delivery of quality solutions
  • Customer satisfaction
  • Long-term partnership

We have successfully delivered 100+ projects and have a proven track record in FinTech, HealthCare, AdTech, and Media industries.

Romexsoft possesses a 5-star rating on Clutch due to its strong expertise, responsiveness, and commitment. 60% of our clients have been working with us for over 4 years.

Related Success Stories

Appium Automation Testing for Healthcare Mobile App
Explore how our QA engineers applied their expertise in the client’s mobile application business logic, user flows, and test scenarios, implementing a suitable automation testing process with Appium.
  • Automation Testing
  • HealthTech
  • UK
24/7 DevOps Support Services for HealthTech Company
Learn how we enhanced a client's application performance, bolstered its resilience, and ensured the security of users' data.
  • 24/7 DevOps Support
  • HealthTech
  • UK

Disaster Recovery Solution on AWS FAQ

How does an aws pilot light strategy ensure rapid recovery in disaster scenarios?

An aws pilot light strategy involves maintaining a minimal version of a system, with core components always running. This ensures that in the event of a disaster, the system can be quickly scaled up to become fully operational, using the most recent data. By having this core system always on standby, recovery times are significantly reduced compared to traditional methods, ensuring business continuity with minimal disruption.

What are the key benefits of using aws disaster recovery solutions over traditional DR methods?

AWS disaster recovery solutions offer several advantages over traditional DR methods. They provide flexibility in terms of scaling, allowing for cost-effective solutions that can be tailored to specific business needs. AWS DR solutions also ensure high availability, with multiple regions and zones to choose from, ensuring data integrity and availability even in the event of regional outages. Additionally, the pay-as-you-go model of AWS allows for cost savings, as businesses only pay for the resources they use.

How does integrating Infrastructure-as-Code (IaC) tools, like Terraform, optimize the DR process in AWS environments?

Integrating Infrastructure-as-Code tools, such as Terraform, into the DR process allows for the automated and consistent provisioning of AWS resources. This ensures that the DR environment is always in sync with the production environment, minimizing potential discrepancies during recovery. IaC tools also allow for version control, ensuring that any changes to the infrastructure are tracked and can be rolled back if necessary. This level of automation and consistency ensures a smoother and more reliable DR process.

In the context of a pilot light aws strategy, why is continuous data synchronization pivotal for successful disaster recovery?

Continuous data synchronization is crucial in a pilot light aws strategy as it ensures that the standby environment is always updated with the most recent data from the primary system. This means that in the event of a disaster, the recovery process will restore the most up-to-date version of the system, minimizing data loss and ensuring business continuity. Continuous synchronization also reduces the risk of discrepancies between the primary and DR environments, ensuring a smoother recovery process.

Craft Your Vision – Make the First Step.
Book a Consultation With Our Experts.

    Contact Romexsoft
    Get in touch with AWS certified experts!