Load Balancers

There are two main types you can use: layer 4 (Network Layer) load balancers and layer 7 (Application layer) load balancers.

The main transport layer protocols are TCP and UDP, so a L4 load balancer will make routing decisions based on the packet headers for those protocols: the IP address and the port. You’ll frequently see the terms “4-tuple” or “5-tuple” hash when looking at L4 load balancers.

This hash is based on the "5-Tuple" concept in TCP/UDP.

Source IP • Source Port • Destination IP • Destination Port • Protocol Type

With a 5 tuple hash, you would use all 5 of those to create a hash and then use that hash to determine which server to route the request to. A 4 tuple hash would use 4 of those factors.

With Layer 7 load balancers, they operate on the application layer so they have access to the HTTP headers . They can read data like the URL, cookies, content type and other headers. An L7 load balancer can consider all of these things when making routing decisions.

Popular load balancers like HAProxy and Nginx can be configured to run in layer 4 or layer 7. AWS Elastic Load Balancing service provides Application Load Balancer (ALB) and Network Load Balancer (NLB) where ALB is layer 7 and NLB is layer 4.

The main benefit of an L4 load balancer is that it’s quite simple. It’s just using the IP address and port to make its decision and so it can handle a very high rate of requests per second. The downside is that it has no ability to make smarter load balancing decisions. Doing things like caching requests is also not possible.

Layer 7 load balancers can be a lot smarter and forward requests based on rules set up around the HTTP headers and the URL parameters. Additionally, you can do things like cache responses for GET requests for a certain URL to reduce load on your web servers.

The downside of L7 load balancers is that they can be more complex and computationally expensive to run. However, CPU and memory are now sufficiently fast and cheap enough that the performance advantage for L4 load balancers has become pretty negligible in most situations. Therefore, most general purpose load balancers operate at layer 7. However, you’ll also see companies use both L4 and L7 load balancers, where the L4 load balancers are placed before the L7 load balancers. Facebook has a setup like this where they use shiv (a L4 load balancer) in front of proxygen (a L7 load balancer). You can see a talk about this set up here.

Load Balancing Algorithms

Round Robin - This is usually the default method chosen for load balancing where web servers are selected in round robin order: you assign requests one by one to each web server and then cycle back to the first server after going through the list. Many load balancers will also allow you to do weighted round robin, where you can assign each server weights and assign work based on the server weight (a more powerful machine gets a higher weight). An issue with Round Robin scheduling comes when the incoming requests vary in processing time. Round robin scheduling doesn’t consider how much computational time is needed to process a request, it just sends it to the next server in the queue. If a server is next in the queue but it’s stuck processing a time-consuming request, Round Robin will still send it another job anyway. This can lead to a work skew where some of the machines in the pool are at a far higher utilization than others.
Least Connections (Least Outstanding Requests) - With this strategy, you look at the number of active connections/requests a web server has and also look at server weights (based on how powerful the server's hardware is). Taking these two into consideration, you send your request to the server with the least active connections / outstanding requests. This helps alleviate the work skew issue that can come with Round Robin.
Hashing - In some scenarios, you’ll want certain requests to always go to the same server in the server pool. You might want all GET requests for a certain URL to go to a certain server in the pool or you might want all the requests from the same client to always go to the same server (session persistence). Hashing is a good solution for this. You can define a key (like request URL or client IP address) and then the load balancer will use a hash function to determine which server to send the request to. Requests with the same key will always go to the same server, assuming the number of servers is constant.
Consistent Hashing - The issue with the hashing approach mentioned above is that adding/removing servers to the server pool will mess up the hashing scheme. Anytime a server is added, each request will get hashed to a new server. Consistent hashing is a strategy that’s meant to minimize the number of requests that have to be sent to a new server when the server pool size is changed. Here’s a great video that explains why consistent hashing is necessary and how it works. There are different consistent hashing algorithms that you can use and the most common one is Ring hash . Maglev is another consistent hashing algorithm that was developed by Google in 2016 and has been serving Google’s traffic since 2008.