How to Increase The Scalability of a Web Application

Learn the best practices to improve scalability of your web application from this quick guide.

Time is money. That’s a worn-out expression by now. But think how this influences your product success, especially when it comes to web-based applications.

The average user expects the web page to load within 2 seconds. Some “patient” types will wait up to 10 seconds before bouncing off your website for good. Now, imagine when a couple of thousands of users want to use your web product simultaneously. All of them still expect it to perform the desired operation within 2 to 5 seconds.

At this point, you are bound to think more about your web app performance. Will it handle the load? How many loads can it handle while still maintaining a reasonable response time?

What Stands For an Excellent Web Application Performance?

Web Application Performance

Application performance may mean many things, but for web app developers, it means just two key factors:

Response Time

This refers to the amount of time it takes from an initial user request to receipt of a response. It must be rapid, given today’s user demands for web-based software.


More online users mean that an increased number of concurrent workloads run within your app. Scalability can be recorded as a ratio of the increase in app performance to the upsurge in computing resources used. In the case of scalable applications, this also means the ability to provide extra resources without changing the structure of the central node. In other words, scalable web apps are quick to adapt to any surges in usage and remain stable even during peak performance.

Two other important terms worth mentioning in the context of scalability architecture are:

  • A saturation point – the tipping workload intensity level a system can tolerate. Once reached, the application starts to malfunction. The goal of scalable architecture is to adapt resources provisioning before your operations go haywire and lead to failure.
  • Recoverability – the system’s ability to roll back to normal operations after a failure. Fast recoverability translates to less downtime.

Response Time vs Scalability

It’s important to note that high response time does not always mean effective scalability. For instance, an app can have poor response time, yet it can tolerate a high number of user requests and vice versa. Thus, to ensure top web application performance, you’ll need to strike a balance between these two parameters.

Key Tips for Building Scalable Web Applications

A web-based app consists of three key elements – network connectivity (the Internet), the application server, and a database server.
This, in turn, leaves you with four areas where scalability can be applied:

  1. Disk I/O
  2. Network I/O
  3. Memory
  4. CPU

Thus, your first task is to determine where the bottlenecks occur.

Brief Look at Key Factors of Application Scalability

Application Scalability

Conditionally, you can organize the segments of a web application that are the most affected by the performance and scalability issues into the following groups:

  • Static resource tier (browsers; web servers; client-side languages such as HTML, JavaScript, CSS).
  • Business logic tier (Server-side programming and scripting languages such as PHP, Java, Python; server operating systems; application servers).
  • Permanent storage tier (data storage engines used; data access mechanisms such as SQL, ORM, GQL, etc; operating systems; file storage).

Each of these three areas can pose a different set of challenges when it comes to increasing the performance and scalability of a web application. Let’s specifically look into the common scenarios and solutions to them.

Performance Tuning

Performance Tuning

Performance tuning assumes conducting a massive troubleshooting session to identify gaps in scalable website architecture and other issues causing the loss in performance. Typically, that may also include refactoring the web application source code, analyzing the current configuration settings, implementing new caching strategies and conducting a series of investigative procedures towards different tiers of the web app.

Some of the best practices, in this case, are as follows:

  • Create a list of very specific functional requirements. It might sound like a no-brainer, but a lot of requirements are often summed up as “the application must load fast” without specifying the exact number (2, 5, 15 seconds, etc). Set very precise goals such as: “97% of “Create an account” requests should respond in less than 2 seconds measured on a web server”
  • Automate. If you face time and/or budget constraints use automated testing to measure the app’s performance and load. The good tools for that are JMeter and Ranorex.
  • Don’t over-optimize. Performance tuning means that your team should only try to fix those issues, which do not meet the set requirements. The more you optimize your application, the more code you will need to fix, which may result in new unexpected issues and longer delivery time.
  • In terms of how to improve scalability, caches are no exception. The concept is this: the most recently requested “writes” are likely to be requested again. So, that information is kept “upfront” where it is retrieved faster. Yet, be aware that having more cache on the outside of the web app will increase its performance, yet pose a certain set of other limitations.

Performance tuning is the first step to understanding how to increase the performance and scalability of a web-based app. Unlike scaling, this procedure is less invasive and requires less time or budgets to be conducted successfully.

Vertical Scaling

Vertical Scaling

Vertical scaling occurs when more resources are added to a single computer system. For instance, one of your web app elements starts requiring more physical memory for processing all the incoming requests. But it is limited to the capacity of a single node.
To fix the issue, you can add more CPU, Memory, Bandwidth or I/O capacity to the node, thus reducing the app’s sluggishness. If you can perform this action, you have scaled the application vertically.
Vertical scaling is often deemed cheaper and simpler as it does not need any significant changes to the web application’s source code. However, the major draw of this approach is that it may not fix the issue for the long term as you can’t merely add more and more resources to a node. At some point, you’ll hit the “wall” posed by the limitations of the operating system itself.

Here are the common constraints to account for:

  • Limited TCP Ports: For instance, if your OS features only a single set of TCP ports, you won’t be able to run two web servers or two proxies required for the vertical scaling.
  • Provider hardware architecture. Certain OSs come with in-built limitations and do not allow you to expand it to the multiple resource capacity. The hardware you are using may not be capable of allocating even more resources. For instance, your web hosting provider has a lower resources threshold than those suggested by the OS vendor. In that case, the efficiency of vertical scaling will suffer unless you’ll opt for another service provider.
  • Security and management constraints assume that for either of these purposes the web application is split into two or more operating systems.

While vertical scaling may be faster and cheaper to implement, it may not be a viable long-term solution for a scalable web application or a scalable website. That’s why you should consider some other options.

Horizontal Scaling

Horizontal Scaling, improve scalability

Horizontal scaling occurs when more nodes (VMs) are added to work in parallel. This way your app can receive more resources not from a single node, but from multiple ones.
If you want to implement horizontal scaling, however, several changes will have to be made:

  • You’ll need a tech solution that will facilitate user request distribution to different VM instances such as a Load Balancer or the Traffic Manager.
  • As well, you’ll need to start shifting to the distributed system architecture.

A load balancer is a device that acts as a reverse proxy and helps administer application and network traffic across different servers.
A traffic manager typically uses DNS to route requests to specific service endpoints based on the coded rules for traffic management. Below are several types of rules you can use to effectively distribute traffic across different VMs:

  • Round Robin: redirect traffic in a rotating sequential manner.
  • Weighted Response Time: redirect traffic to the fastest responding server.
  • Chained Failover: redirect traffic to the next server only if the other one is not capable of accepting any more requests.

How to Approach Horizontal Scaling

The best practices for scaling web applications, in this case, would be to first decouple them using the tier system mentioned earlier:

  • Static content tier, which stands for certain elements of your web app visible interface such as – static JPG images, cascading style sheets, Javascript libraries, etc.
  • Business logic tier is the framework your solution deploys for processing user data or the one available in the permanent storage tier. That could be Java, PHP, Ruby on Rails and other options.
  • Permanent data storage is the place where all the retrieved data is located. Most web applications use relational database systems for that.

One of the most common approaches to increase the response time of a large scale web application is to separate the permanent storage tier into a separate node. Now you have some resources freed for the rest of the application to run faster.
Yet, here’s another common scenario – the permanent storage has migrated, yet the web app becomes sluggish after some time again. At this point, you can choose to either scale the new node vertically or to keep scaling other nodes horizontally and try to separate the business tier from the static content tier.

In this case, your web app is using the same node to deliver both static and dynamic content to users, which may create hiccups in its performance. To tackle this issue it may be worth migrating static content tier to a separate node as it is easier to decouple compared to the business logic tier. Further down the road, you may want to apply horizontal scaling to the newly separated tries to mitigate scalability issues even further.

Improving Database Performance

If you have determined that your databases are the reason for performance bottlenecks, below are several strategies worth trying.

The simplest to implement fix is caching your database queries.

You can run a quick query logging analysis to determine which ones run the most frequently and which take the most time to complete. Afterward, you can cache the responses to those two types of queries, so that they stay in the memory of your web server and could be retrieved faster. This should somewhat reduce the load on your database.
Next, you can implement database indexes that will reduce the time your database needs to locate the data for a certain query.

As well, you can improve session storage. This is particularly useful if your app does a lot of reading and writing to session data. There are several ways to accomplish this:

  • You can migrate your session storage to an in-memory caching tool such as Redis or Memcached. These are much faster to access. However, some data might be lost if the caching system will need to reboot or go offline.
  • Or you can transfer the session information to the cookie itself. But, you won’t be able to store any sensitive customer data in this case.

The most invasive method for improving database performance is splitting them:

  • Vertically (partitioning) – create a new set of loosely coupled sub-databases based on topical concepts e.g. customer orders, customer payment information, etc.
  • Horizontally (sharding) – you can split your database horizontally based on certain attributes. 

Alternatively, you can migrate your databases to AWS and set up auto-scaling using one company’s managed database services:

  • Amazon Relational Database Service (Amazon RDS) – lets you set up your database in the cloud using one of the following database engines: MySQL, PostgreSQL, Oracle,  Microsoft SQL Server, MariaDB, and Amazon Aurora.
  • Amazon DynamoDB – a proprietary NoSQL database service.

Database migration to the cloud is a tedious and complex task, however, the tradeoffs of doing so are the largest.


Building a web app is one thing. Making it scalable, either upfront or after they have already been developed is quite another matter. Developing high scalability architecture requires time, expertise, and careful planning.

Romexsoft team would be delighted to help you improve the scalability of your existing products or help develop new resilient and scalable cloud web apps on AWS. To get a better sense of the results we can achieve, take a look at our case study Building SaaS Banking Platform for FinTech Company.

Written by Romexsoft on April 14, 2017 (edit 2019)

Serhiy Kozlov
Serhiy Kozlov CEO/CTO at Romexsoft - AWS Partner in Cloud Migration & Application Modernization | AWS Certified Cloud Practitioner | LinkedIn Profile
Share The Post