Multi-Tenant SaaS Architecture: Essential Principles and Approaches

If you are reading this article, chances are you pursue designing a robust and scalable multi-tenant SaaS solution within your technology stack. Yet, we suggest abstracting from specific solutions and considering the fundamental aspects of multi-tenant solutions, irrespective of the underlying technology.

The goal of the article is to offer a clear understanding of what it takes to create a resilient and high-performing SaaS environment. If this aligns with your interests, let's explore the following concepts.

  • Exploring multitenancy as a core focus of SaaS architecture
  • How architects try to balance those cornerstones
  • How we achieve scalability in SaaS multi-tenant
  • What are the optimal scaling strategies are

Essential Principles of Multi-Tenant SaaS Architecture

Architecting SaaS: Balancing Multi-Tenancy, Availability, and Cost Efficiency

Designing a SaaS entails placing the main focus on resilience, scalability, and availability. For this reason, Amazon has created AWS Well-Architected Framework, which will shed some light on the availability, scalability, and resilience principles for a stable SaaS environment. This guide provides insightful advice and recommended strategies for accomplishing these goals, and while it is highly advisable to look through, we will catch a glimpse into the basics further in the section.

The SaaS Architect Dilemma

In Software as a Service (SaaS), architects are tasked with striking a delicate balance among the primary requirements. Let us look more closely at what essentials have to be accurately proportioned to result in an efficient multi-tenant SaaS.

SaaS architecture dilemma


The complexity of multi-tenancy presents additional challenges for SaaS architects striving for robustness and scalability. Because all users will be impacted by downtime simultaneously (as compared to traditional systems where the cost of error is significantly lower), assuring availability matters.

As regards multi-tenant SaaS architecture, the availability of a system matters even more. In case a large multi-tenant SaaS provider runs into a notable outage, this news is going to gain traction and affect the provider’s reputation. To avoid far-reaching effects on both your brand and your customers, it is critical to raise the bar for availability in SaaS multi-tenant setups.


As they expand, companies aim to increase their profits. Luckily, the economies of scale that are built into SaaS models are readily available to be leveraged.

On the other hand, availability, reliability, scalability, and compliance to security regulations require weighty investments. For this reason, SaaS architects are tasked with striking a balance between infrastructure funding and over-provisioning.

Predictability and Dynamic Workloads

Another challenge for SaaS architects is predictability. While software inherently brings unpredictability, in a multi-tenant architecture, this issue is much aggravated by new tenants joining, existing tenants leaving, and their workload fluctuations throughout the day and month. Architects must design systems that ensure availability, efficiency, and other critical factors while accommodating the evolving usage patterns of tenants over time.

Flexibility for Various Deployment Models

The different architectural footprints that SaaS environments frequently have are another thing to tackle. In contrast to situations where attaining size and resilience inside a solitary architectural framework is simple, SaaS solutions need to support several deployment patterns. SaaS architects may struggle when it comes to addressing size, robustness, and other crucial elements across these many deployment patterns. Talking about pooled and segregated deployments — each with unique solution footprints and architectures — is part of this. Keep on reading to find out more.

Meeting Diverse Business Requirements

Lastly, there is a business imperative to sell into diverse market segments, catering to both small and large companies, each with varying needs. This requires offering different tiered experiences, potentially implementing throttling mechanisms, and customizing the user experience accordingly. The architecture must accommodate these requirements and deliver a scalable, flexible solution that meets the demands of multiple customer segments.

Balancing all the aforementioned requirements is a lot to handle. At one extreme, businesses demand maximum efficiency, minimal infrastructure costs, and shared resources to maximize profit margins: they emphasize economies of scale to achieve these goals.

On the other hand, they also want flexibility to support diverse markets, tiered offerings, and multiple deployment models. These objectives do not necessarily clash, but harmonizing them is a challenge, in fact, a key challenge in designing SaaS systems.

Designing for Multi-Tenant Scale

The Perspective on Scalability

Let’s examine the concept of scalability in the context of building a multi-tenant environment. When dealing with this requirement, it makes sense to expand our perspective beyond traditional notions. Often, people narrowly define scalability as either vertical scaling (increasing resources within a single server) or horizontal scaling (adding more servers to distribute workload).

Broadening your view of scale in SaaS

In the cloud era, scalability goes beyond just adding more resources. While elasticity and cloud constructs enable dynamic resource allocation, true scalability in a multi-tenant architecture involves more nuanced considerations. When designing for multi-tenancy, architects must account for unique challenges that arise from serving multiple customers on a shared infrastructure.

Onboarding Scalability

Scaling involves more than just infrastructure — it encompasses various aspects, including the onboarding of new tenants into the environment, which is often unjustly overlooked or delayed.

Prioritizing scaling the application itself first and then addressing onboarding automation as a secondary concern is a common strategy. If you ever adhered to it, suppose you need to onboard hundreds or even thousands of new tenants overnight — would your current onboarding process be equipped to manage such a surge? For many, the question remains rhetorical. Meanwhile, it proves time and again that onboarding processes must be in place to efficiently handle sudden increases in demand.

The market landscape suggests that consumer-oriented companies recognize the need to scale their onboarding processes due to their rapid growth requirements. On the other hand, business-to-business (B2B) organizations, which onboard fewer tenants monthly, may underestimate the significance of scaling their onboarding procedures.

Our advice is, regardless of the volume of tenants being brought into the system, to embrace onboarding as a fundamental aspect of your scalability planning.

Operational Scalability

Architects have a responsibility to equip their teams with the necessary tools, mechanisms, and constructs to effectively operate at scale. How else would they gauge the efficacy of architecture scaling if it were not for the right data, metrics, and operational insights? Operations must be carefully considered and integrated into the overall scalability strategy to ensure success.

Operational scalability is critical for SaaS applications as it directly impacts the service quality perceived by users and the overall success of the service provider. It requires a thoughtful approach to architecture and ongoing management to ensure that as more tenants join or as their usage grows, the system remains robust, secure, and responsive.

Deployment Scalability

At last, deployment is just as important in achieving scalability as all the previous components. Here is something to consider. How rapidly can you implement feature flags, release new features, and effectively handle different deployment footprints across different SaaS systems? It is undeniable that every part of your organization needs to grow when more clients are added, not just the application’s basic infrastructure. Scalability includes deployment procedures and operational effectiveness in addition to technology.

Understanding Scaling Strategies

We will try to condense the thoughts, concepts and implications of developing a scaling strategy. And by far the best representation of those is the picture above.

First things first. The left-hand side illustrates the primary thing to consider: workloads in all their diversity of profiles, tenants’ needs and consuming behaviour which fluctuate over time. Addressing these workloads and understanding their dynamics is fundamental to any scaling plan.

On the right-hand side, we see options for addressing these workloads. This includes a comprehensive list of choices, such as selecting the appropriate compute stack — EC2, serverless, or containers — based on how well they align with different workload requirements and business objectives. Similarly, the storage stack selection, like RDS or DynamoDB, managed or unmanaged, plays a crucial role based on workload nature and business goals. Other factors like domain, industry, compliance requirements, and environmental norms also heavily influence the chosen strategy.

Key Architectural Considerations

Integrating all of these considerations from both sides of the equation is the content in the middle section, it includes:

  • Tiering strategies
  • Isolation

It might not be obvious when thinking about scalability, but it does play a significant role. Depending on how resources are deployed and the architecture chosen, certain elements will scale differently than others. Some approaches offer good compromises on isolation, while others don’t:

  • “Noisy neighbor”
  • Consumption optimization
  • Cost efficiency
  • Operational efficiency.

Determining the most suitable compute stack for environments with spiky workloads is key to efficient scaling. When beginning this process, the focus should be on carefully evaluating each of these factors with both business stakeholders and architects to grasp the variety of available options. Overlooking these considerations and opting for technology based solely on personal preference can lead to unfavorable outcomes.

Simplest View of Scaling

Simplifying works well in highlighting the essentials, so the simplest view of scale in SaaS is once again illustrated in the visuals.

This approach involves placing multiple tenants into a shared environment, utilizing what we refer to as a “shared pool” infrastructure. Here, storage and compute resources are shared among all tenants, scaling horizontally to accommodate their needs collectively. This setup typically involves some level of over-provisioning to ensure that the experience meets the requirements of all tenants.

As per the downsides, managing scaling policies and mechanisms in this shared environment can be complex, as it relies on the nature of microservices and their scalability. Often, over-provisioning is used to mitigate potential challenges with scaling policies. If this straightforward approach aligns with your SaaS environment, leveraging standard scaling tools and the benefits of pooled resources can suffice to meet your needs.

The Reality of Scale

In contrast to the previous section, most real-world SaaS environments deviate from the simplified approach from above. Apart from those few simplified shared pools that do exist, shared environments usually coexist with different configurations.

Let us explore a few examples. Within a SaaS environment, one might encounter a scenario where certain microservices like “order” and “product” operate within a shared pool infrastructure. The “order” service might run on Lambda while the “product” runs on containers, each chosen based on workload characteristics.

Next, there are also tenant-specific microservices that are divided into silos. The need for this division may result from SLA requirements, compliance requirements, or other demands for a particular solution.

The reality of scale in multi-tenant SaaS

Besides, a full-stack silo strategy is chosen by some clients, who get a completely specialized silo environment. All tenants and services usually run the same software version and are administered through a single interface, even with these differences in infrastructure deployment. Now, what we see from this short overview of the combinations of various deployment styles is how much adaptability and customisation SaaS settings offer to meet business needs and consumer preferences.

At the same time, each approach dictates a special scaling strategy. In this way, scaling looks much more difficult for the mix of architectural patterns that represent the majority of the present-day SaaS environments. We will now look into the matter of scaling in more detail.

Microservice Decomposition

A sensible initial step of designing a scalable SaaS system includes defining personas, or user profiles. What in-depth consumption profiles, isolation preferences, and tiering requirements for each persona provide is the ability to more accurately match deployment options with user demands with a view of user satisfaction in mind.

The choice of the optimal microservice decomposition technique allows architects to address a list of different objectives. To illustrate, individual workload characteristics and isolation needs will determine whether microservices should be pooled or isolated.

Scale starts with understanding workload profiles

On the other hand, failing to customize microservice architecture to fit observed consumption patterns in favor of defining them based on domain objects may lead to creating microservices that do not directly correspond to domain entities but serve critical functional or performance optimization purposes within the system architecture.

Microservice Decomposition Example
Consider this. A service like “order” was previously made up of several Lambda functions that represented separate processes. As a microservice, this architecture appeared adequate at first. Over time, one may discover that this service needed to be divided into four distinct microservices after a more thorough examination of the isolation and consumption needs took place.

Decomposing to microservices for multi-tenant workloads

Let’s imagine, in this simplified scenario, that “fulfillment” had special isolation needs or became a vital scaling component of the system. The decomposition was modified to optimize for scalability and isolation requirements based on these workload characteristics. This brings awareness of workload profiles to a whole new level.

Choosing Compute Technologies

While it is true that choices like serverless (Lambda), containers, and other deployment patterns have to match particular requirements, when pondering computing technologies for your environment, keep in mind that various computing technologies may play diverse roles within a SaaS architecture and are not mutually exclusive.

Particularly, some designs could use containers for the application plane or certain microservices and serverless for the control plane. Depending on the requirements for performance and the peculiarities of the workload, one may choose to use Lambda functions, containers, or batch workloads.

Having an early grasp of the deployment methods from working closely with product teams is an advantage. Armed with those, you can establish a thorough scaling plan from the start, this entails anticipating client demands, such as whether full stack silo deployments would be necessary. This approach increases the likelihood of the architecture being suitable for a wide range of deployment situations and scaling requirements.

Classic Horizontal Scaling

Now let’s take a closer look at the selection of computation technology by seeing how compute settings affect the way a microservice like “order” with methods like GetOrder and UpdateOrder is deployed.

Classical horizontal scaling in multi-tenant SaaS architecture

The microservice itself is the deployment unit in the case of EC2, a well-known AWS solution. This indicates that handling elasticity and load spikes through horizontal scaling is feasible. Relying only on EC2 might present hurdles in a multi-tenant architecture with varying workload constraints, particularly when it comes to promptly handling spikey loads. As a safeguard, overprovisioning frequently results from this.

Evaluating the speed at which instances can spin up and respond becomes particularly relevant in pooled scenarios, where several tenants can make use of shared resources to minimize idle capacity and maximize resource usage.

Lambda Scaling

With Lambda, and particularly its interoperability with SaaS multi-tenancy, we have seen significant progress. Lambda is a transition to a managed compute service in which individual functions serve as the scale unit rather than the service as a whole. What this means is that the scalability and cost are modified appropriately if one function, such as “update order,” is used a lot these days while “GetOrder” is used less. This granularity removes the requirement for fine-grained service decomposition while enabling effective scalability based on real usage patterns.

Lambda scaling in multi-tenant SaaS

Another benefit of using Lambda’s approach is that it removes the need to worry about choosing the best scaling policy or making sure the microservice scales as a whole. Scaling, instead, takes place at the function level, maximizing cost-effectiveness and resource allocation.

Lambda also works well with pooled and siloed deployment methods. Both deployment options in our serverless can feature in SaaS design, how exactly and what benefits they entail — could be a useful topic for research.

Container Scaling

Because of the variety of tools and features that EKS (Amazon Elastic Kubernetes Service) provides, many SaaS organizations believe it to be a good deployment approach. EKS offers features like namespace per tenant, opening up new insights into deployment strategies. With Kubernetes namespaces, this method enables resource separation, providing flexibility and control over workloads particular to individual tenants.

In the EKS environment, workloads may be assigned to certain nodes thanks to features like node affinity, which allows for rapid scalability without much over-provisioning. The Kubernetes cluster operates more efficiently and consequently uses its resources more effectively.

Container scaling in multi-tenant SaaS architecture

AWS Fargate is also included in EKS, providing a serverless choice in the Kubernetes domain. Users can function serverlessly by abstracting away the cluster’s underlying infrastructure elements while using Fargate. This simplifies operational overhead and makes deployment more efficient.

Choosing the Right Scaling Strategy for Multi-Tenant SaaS

There is no right or wrong solution when choosing a scaling approach. In fact, there are a number of possibilities one should consider. For now, the best approach relies on examining the individual workloads and the identities connected to them. With the workloads’ characteristics, isolation requirements, and consumption patterns in hand, architects may customize the scaling approach to best meet the requirements of the SaaS environment. So, let us briefly overview the options.

Deployment models and scaling profile of multi-tenant SaaS

Silo Scaling

Silo scaling refers to a scaling approach where each tenant’s data and services are handled by separate instances or sets of resources within the cloud environment. This method can be seen as creating individual “silos” for each tenant, rather than having all tenants share the same application and database instances.

Siloed scaling can be the starting point for our analysis of how deployment types affect scalability. In deployment models that are siloed, meaning that every tenant has a private, isolated environment, the scaling profile is usually more predictable. This could be in line with the tenant’s business activities because it resembles conventional systems with distinct life cycles. Such settings are usually easy to scale, following predictable patterns and with no need for complicated modifications.

Pool Scaling

Conversely, consumption patterns become more variable and unpredictable in pooled situations, when resources are shared among several tenants. It is difficult to scale in these kinds of situations since resource utilization varies in peaks and troughs. While worries about idle usage might not be as strong, problems like the effects of noisy neighbors become more noticeable and must be properly controlled to guarantee consistent performance for all residents.

Pool Scaling in the context of multi-tenant architectures involves a shared resource model where multiple tenants utilize the same underlying infrastructure. This approach is fundamentally different from silo scaling, where each tenant has dedicated resources. Pool scaling leverages a communal set of resources that dynamically serves all tenants, optimizing resource utilization and cost-efficiency.

Scaling with Tenant Pods

One further aspect of scale goes beyond the limitations of discrete services like computation and storage. To accommodate a larger perspective of scale, residents are grouped into pods, each of which may house a certain number of occupants. We can ensure that the scaling is optimum and that any problems or interruptions stay inside the boundaries of the pod by creating scaling rules specifically designed for these pods, which often house a predetermined number of tenants. Inasmuch as the impact on a certain subset of tenants within the pod is restricted, this method improves isolation while increasing efficiency.

The next stage is to spin up more pods to horizontally scale the environment after a pod’s limits and ideal tenant capacity have been determined. Instead of going into specific service-level regulations, this strategy focuses on scaling whole pods through sharding and scaling at the pod level. Management and scalability issues can be resolved by migrating across pods if a tenant inside one starts to cause trouble or does not fit anymore.

Multi-region models may be supported by extending this pod-based scaling technique. A multi-region footprint is made possible by the use of pods as the deployment unit, which allows these pods to be spread throughout many areas. However, the deployment and operations of this technique are more complicated. Aggregating operational data from several pods is necessary for managing deployments across pods to provide thorough visibility and management.

Although this method adds more complexity, it offers a distinct viewpoint on scalability and may be a useful tactic based on the particular needs and factors of the SaaS environment.

Yuriy Bondarenko
Yuriy Bondarenko Delivery Manager, Romexsoft
Share The Post