A Critical Guide to Content Delivery Networks
How do CDNs work, and how to get the most benefits out of them.
Hi Friends,
Welcome to the 136th edition of the Polymathic Engineer.
This week, we do a deep dive into Content Delivery Networks. This is the outline:
What is a CDN
Benefits of Using a CDN
CDNs Architecture
How CDNs Work
How is the optimal edge server found?
Beyond Caching: The Power of the Overlay Network
Best Practices for CDN Optimization
Real-World CDN Examples
Project-based learning is the best way to develop technical skills. CodeCrafters is an excellent platform for practicing exciting projects, such as building your version of Redis, Kafka, DNS server, SQLite, or Git from scratch.
Sign up, and become a better software engineer.
What is a Content Delivery Network?
In today's digital world, how well a website works has a direct effect on how users feel and how well the business does. Even a short wait can cause more people to leave your site quickly, which costs you money.
For example, Google reported that the probability of visitors bouncing (leaving a web page) increases 32% as a page's load time increases from one second to three seconds. For e-commerce websites like Amazon, conversion rates drop by 0.3% every second it takes to load.
This is where Content Delivery Networks (CDNs) come into play.
A CDN is a geographically distributed network of servers that work together to send content quickly and reliably across the world. When a user requests content from a website using a CDN, the request is directed to the nearest CDN server rather than traveling to the origin server, which might be thousands of miles away.
You can think of a CDN as a network of ATMs. If money were available only from one bank office in a city, everyone would have to make time-consuming trips to get there. However, with ATMs spread out across the city, anyone can quickly get their money. In the same way, CDNs place content closer to users, reducing the physical distance data must travel.
CDNs are becoming more and more important to the modern web. Big companies like Akamai, Cloudflare, and Amazon CloudFront constantly grow their global reach and improve their services.
In the following sections, we will discuss what benefits CDNs bring and how they work under the hood.
Benefits of Using a CDN
The most clear benefit of a CDN is that it makes pages load faster. If the client is on the other side of the world, the response time will be over 100ms no matter how fast the server is. This is because of network latency – the time it takes for data to travel from the server to the user's device –, which is bound by the speed of light.
A CDN cut down on latency by serving content from servers that are closer to the clients. Also, the error-rate is much higher when sending data over long distances over the public internet.
A second clear gain is that less bandwidth is being used. Keeping a busy website up and running can cost a lot of money. Because they cache content and handle a lot of traffic, CDNs help lower these costs.
When content is served from the CDN's cache rather than your origin server, you save on bandwidth costs associated with your hosting provider. In addition, CDNs are also efficient and use file compression and other optimization methods, to reduce the size of the data being sent.
But it's not only about performance and scalability. CDNs also help to make things more reliable and build redundancy.
With a CDN there are more servers that handle requests from users, and store several copies of data. This redundancy is especially helpful when your server suddenly gets a spike of users, which could cause it to crash otherwise. If one of the servers fail, users can also fail over to other edge locations and get the content from there.
Modern CDNs also help protect your website from different threats. They can, for example, spread Distributed Denial of Service attacks across their entire network. This way, attackers won't be able to overwhelm the main server. CDNs can also have WAF (Web Application Firewall) features that stop harmful data before it gets to your server.
CDNs Architecture
At their core, CDNs are distributed caching systems that work as an overlay network built on top of the internet. Before understanding what this exactly means and how DCNs work, let’s first have a look at their four main components: edge servers, origin servers, control plane and monitoring.
Edge servers are the backbone of any CDN. These are the machines that store cached content and deliver it to nearby users. They are set up in Points of Presence, which are data centers strategically placed around the world.
Each PoP contains multiple edge servers equipped with fast storage systems to cache content, powerful processors to handle requests routing and content delivery, and network interfaces optimized for high-throughput data transfer.
Each CDN provider has a different number of PoPs and how they are spread out. Big companies like Akamai and Cloudflare run thousands of Points of Presence (PoPs) in hundreds of places around the world. Smaller CDNs may focus on specific areas.
The more PoPs a CDN has, the closer its edge servers can be to end users, resulting in lower latency. However, having more PoPs creates a trade-off: because the content is spread out across more places, each one may have a lower cache hit ratio, which is the number of requests that can be answered straight from the cache.
Origin servers are where the original content lives. These are usually the web servers, application servers, or storage systems that hold the most up-to-date copies of your data.
When a CDN edge server needs to fetch content that isn't in its cache, it connects to the origin server to get the latest version. Most of the time, dedicated private network connections are used instead of the public internet to make the connection between edge and origin servers as fast and effective as possible.
Some CDN architectures include origin shield servers, which act as a buffer between your origin server and the CDN's edge servers. They collect cache misses from several edge servers and help make the origin server less busy.
The control plane is the brain of the CDN. It controls how the whole network works, which includes:
applying and updating settings across all edge servers
load balancing traffic to prevent any single server from becoming overwhelmed
continuously checking the status of edge servers and rerouting traffic away from any that become unavailable
choosing which edge server should handle each request based on things like how close it is, how busy the server is, and what content is available