Scaling Your Web Application

Architecture Design For Large-Scale Systems

Harith Javed Bakhrani
8 min readMar 26, 2020

Today, the internet is full of resources that teach you how to code, and how to create web applications, but it lacks resources on how to make your web app scalable so that it can accommodate thousands of users without having performance issues and making it fault-tolerant. In this article, I would briefly go through some terms and technologies to give you a glimpse of how you can better design architecture for large-scale systems.

Hosting Web App Without Considering Scalability

Before we begin to expound on how we can make our web app scalable and fault-tolerant, we would look at a typical architecture.

Simple Web Architecture

In the above diagram, we can see a very simple web architecture; we have only one server that listens for client requests and communicates with the database. That’s really it, nothing complex to it.

Considering Scalability

Now that we have already looked at how we would host our web app on a simple architecture, we would begin exploring different kinds of issues that we would run into with this kind of architecture and how we would tackle them.

Issue #1: Web Server Overload

As we get more and more concurrent clients connecting to our web server, our web server would eventually run out of resources (like CPU and RAM) and “die”. In this case, we need to increase our web server resources to accommodate more and more clients.

One way to achieve this is via vertically scaling our server. Let us have an in-depth look at what this means;

Vertical Scaling

To vertically scale means to add resources to our existing server or to replace it with another powerful server.

Vertically Scaling Web Server

Basically, the architecture remains the same, it’s just that the server has been upgraded to accommodate more clients than before. This would solve our issue temporarily, but it’s not a permanent fix. As more and more people begin to use our web app, the added resources would eventually run out and we would need to keep on vertically scaling our server. We have a better fix for this, which also eliminates a second concern of having a single point of failure.

Issue #2: Single Point Of Failure + Issue #1

Although we have taken a look at how to vertically scale our server, we have not found a lasting solution, nor do we have a fault-tolerant infrastructure.

What exactly does it mean to have a fault-tolerant infrastructure? At the moment, we have only one server, which can go down for a number of reasons, for example, the server might need some maintenance to be done. In order to achieve a fault-tolerant infrastructure, we set up multiple servers that host the same application, so that if any of the servers go down, we still have other servers running.

Now that we have a fault-tolerant infrastructure, let us see how we would scale it in order to support more and more people.

Horizontal Scaling

To horizontally scale means to add additional servers that serve the same purpose. As our application continues to get popular day by day, the current servers exhaust out of resources by supporting all the clients, thus we need to add more servers to serve other incoming clients.

Horizontal Scaling

Having multiple servers gives birth to another issue; how would we distribute the traffic between the servers in an effective manner?

Issue #3: Distributing Traffic

To evenly distribute traffic across the servers, we use load balancers.

Distributing traffic via a load balancer

The load balancer acts as an intermediary between the clients and the servers, it knows the IP addresses of the servers and thus is able to route traffic from the clients to the servers.

There are various methods that a load balancer can use to route traffic between the servers, one of them is round robin which sends requests to the servers on a cyclical basis. For example, if we have 3 servers, then it would send the first request to server 1, the second request to server 2, third request to server 3, and fourth request back to server 1 again. However, the most efficient method is when the load balancer would check if the server is capable of handling the request and only then send the request.

Having just one load balancer to do the work raises the issue we mentioned before; we have a single point of failure, if this load balancer dies out then we do not have a backup. To overcome this issue, we can set up two or three load balancers where one would be actively routing the traffic and the others would be simply there as a backup. The load balancers can be a tangible piece of hardware or they can simply be a software in one of the servers. Today, with the cloud services available at our fingertips, it is relatively cheap and easy to establish a load balancer.

We have finally scaled our servers horizontally and routed traffic between them in an evenly distributed manner using a load balancer 🙌🏽. What can possibly go wrong now? This leads us to Issue number 4!

Issue #4: Factoring out sessions

If for some reason your web app is using sessions to identify returning users, then having multiple servers would not work out because the sessions are by default stored in servers. For example, if user A logs in for the first time the load balancer might take him to server 1 where his session would be stored, on his second request, the load balancer might take him to server 2 where he would need to log in again because server 1 had stored his session.

To solve this issue, we can decouple the sessions to a different storage solution, for example to a Redis server. This way all the servers would be getting and storing their sessions from and to the Redis server. Of course, we can add redundancy for the Redis server as well to eliminate single point of failure.

Decoupling the sessions

Other than sessions, if there is anything else being stored on the servers, it should be decoupled to its own storage solution and all the servers should be given access to that storage location.

Issue #5: Queries Are Expensive

When a user loads a page, or hopes to another page by clicking a link, the web app would most likely query a database. For a small application, that is handling around a hundred to a thousand users, querying a database is non-trivial when it comes to performance, but when the question of serving more than a million users arises, then the answer to this is to use a cache to boost the performance of the web application.

When a user logs in, it is most likely that we would run a database query to get the user at every request. In this scenario, we would first try to get the user from the cache, if the user exists then there is no need to query the database, and if the user does not exist, then we would go ahead and query the database. Let us write some pseudocode:

user = getUserFromCache(userId);if (user == Null) {
user = getUserFromDb(userId);

The number of users that could be stored in a cache is finite because, of course, computer memory is finite. For the cache to keep on storing new and new users as they come in, the cache would need to remove inactive users from its memory.

Issue #6: Possibility Of Database Failure

By now we have known that having just one server for a particular job is bad because it introduces a single point of failure to our infrastructure. At the moment, we have only one database that is storing all the information, in order to introduce multiple database servers we use a technique known as replication.

Replication: master-slave architecture

Replication is all about making automatic copies of something. Generally, we would have a master database where we would read data from and write data to and one or more slave databases which would be connected to the master database via a network connection and their purpose is to get a copy of every row that exists in the master database. Simply speaking, every query that gets executed on the master database would be executed on the slaves. Whenever the master database would pass out, one of the slaves would get promoted to become the master.

This is also a good topology for websites like Medium which are read-heavy (people read more blog posts than they publish). All of the read requests could be balanced out between the slaves while all of the write, edit, and delete requests would be sent to master.

Another paradigm is master-master architecture in which we have multiple masters connected to multiple slaves. Although we wouldn't talk much about this, just remember that this is also another nice option to have.

Issue #7: Queries Are Still Expensive

Having more than ten million users query from the same database is not that good as it would take the database some time to search for a single user in the midst of ten million users. Sharding is used to increase database efficiency so that queries take less time to get executed.

Sharding is basically to have two or more databases so that queries could be split between them according to some metric. For example, we could have two databases and we could reserve the first database for users whose name begins with A-M and the second database for users whose name begins with N-Z.


This marks the end of our blog post. I hope you learned a lot and are ready to scale your web app to the next ten million people!



Harith Javed Bakhrani

Muslim DevOps Engineer ready to learn and bring to life new and better ways of automating deployments and keeping them alive!