Hi Friends,
Welcome to the 76th issue of the Polymathic Engineer newsletter.
This week, we will explore the YouTube backend and examine the technologies that allow it to serve high-quality and low-latency videos.
The outline will be as follows:
What is YouTube
Video Transcoding
Adaptive Streaming
The YouTube Video delivery architecture
What is YouTube
YouTube, founded in 2005 and bought by Google in 2006, has become the biggest and most popular place to share and watch videos. Over 2 billion people log in to YouTube every month, and 500+ hours of material are uploaded every minute.
This massive amount of video material needs a complex and robust video delivery system to ensure that users worldwide can watch videos without any problems, regardless of their device or network setup.
At its core, YouTube's success depends on its ability to offer high-quality videos quickly. This requires many different technologies, such as encoding, storing, caching, and networking, all working together to give users the best experience possible.
Whether you're watching a tutorial, a music video, or a live stream, the underlying architecture is designed to handle diverse content types and varying demands with little to no buffering and high-resolution playback.
In the following sections, we'll discuss the most important ideas for delivering videos and describe YouTube's complex video delivery system in detail.
Video Transcoding
Video transcoding is a crucial concept in understanding how YouTube delivers videos efficiently.
Before being posted to YouTube, a video is encoded into different formats and resolutions, such as 240p, 360p, 480p, 720p, 1080p, and 4K. This ensures that the video streams smoothly on various devices and screen sizes.
Efficient transcoding is critical because it directly affects the playback quality and the used bandwidth. In this regard, it is important to distinguish between lossless and lossy transcoding.
Lossless transcoding keeps the original data from the source video, so the quality stays the same. It is helpful for archiving videos but impractical for streaming over the internet due to the large size of the files.
On the other hand, lossy transcoding shrinks the video by removing some of the data, which reduces the file size but results in a loss of quality. A good lossy transcoding finds a balance where the reduction in quality is invisible to the average viewer while significantly decreasing the file size.
The algorithms used for lossy are known as codecs. Such algorithms compress and decompress video files, which are very important for video delivery because they decide how efficiently video is compressed. YouTube often uses the formats H.264, VP9, and AV1.
H.264 is widely used because it strikes a good balance between quality and compression speed. VP9, developed by Google, can stream high-definition video at lower bitrates. AV1 offers even better compression, which is important for sending 4K and higher-resolution videos without using too much bandwidth.
Adaptive Streaming
Adaptive Bitrate Streaming (ABR) is a well-known technique for sending videos to clients.
YouTube uses it to change the quality of a video stream in real time as the user's network conditions change, improving the watching experience.
HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH) are well-known adaptive streaming protocols.
These protocols break the video into small chunks stored at different bit rates and resolutions. During playback, the player dynamically switches between these chunks on the fly, picking the best quality that the current network bandwidth can handle.
Adaptive Bitrate Streaming is way better than Progressive Streaming, which streams a single file to the client regardless of the network speed and CPU power.
YouTube's video delivery architecture
YouTube's video delivery architecture is a complex system that uses advanced transcoding, caching, and adaptive streaming methods to ensure fast and smooth video playback.
Before a video is uploaded to YouTube, it goes through a first transcoding step where it is changed from its original file to a common high-quality intermediate format.
This step standardizes the video so it can be processed the same way throughout the pipeline. After that, the video is cut into smaller pieces and encoded into different output formats and sizes.
The segmentation and parallel transcoding process is spread across several machines to improve throughput and cut down latency. YouTube improves the general transcoding efficiency by working on multiple segments at the same time.
Each video piece is ready for flexible streaming after the first transcoding.
ABR protocols are used to dynamically deliver video content, letting the video player choose the best quality video based on the user's device and network settings.
For movies that get a lot of views, YouTube does a second transcoding pass. This extra work uses more computer power to make the file smaller while maintaining the same quality of the client experience.
The higher processing cost is worth it because it allows sending higher resolution videos at the same network bandwidth and is amortized across many playbacks.
After transcoding, the video segments are sent to a Content Delivery Network to minimize latency. When a user requests a video, the system checks if the content is available in the Edge caches.
If not, the request is forwarded to the nearest CDN node with the required video segment. This distributed caching strategy ensures rapid access to video content and reduces the load on central servers.
In addition, YouTube uses playback data from the client's device to estimate performance in real-time. By looking at metrics like how often a client changes to a lower resolution, YouTube can tell if higher-resolution material can be streamed effectively.
This prediction helps make the best use of bandwidth and improves the user experience.
Food for thought
Asking questions like what’s the difference between passing by value or by reference in interviews is not a bad idea. Such basic things are important and I’m always surprised by how many engineers do not know how they work. Link
LLMs will become able to write better and better code. This is the right time to force yourself to think that you are more a problem solver rather than only a programmer.
Having a too high life-style is like being in a cage you can't escape from. At the end of the day, there is nothing more important or valuable than your freedom and time. Link
Interesting what people are using for video/audio transcoding nowadays. Is it still FFmpeg?