Hi Friends,
Welcome to the 6th issue of the Polymathic engineer newsletter. We are a community of 700 people now, thanks again for your support and feedback. Today we will talk about:
What is the TCP protocol
How a TCP connection is established
How TCP implement flow and congestion control
Coding challenge
Tech threads of the week
Interesting tweets of the week
TCP Protocol
The communication between two machines it’s not reliable at the network layer. Data packets are routed from the source to the destination machines through a series of routers. Still, there is no guarantee that these packets will arrive.
This is why TCP is one of the most critical network protocols. It allows a reliable communication channel between 2 processes where data is transmitted in order, not corrupted, and without duplication or gaps.
TCP partitions a data stream of bytes into blocks called segments. Those segments have three essential properties:
they are sequentially numbered so that the receiver can detect duplication or gaps
they include a checksum that is used to check the integrity of the data
they need to be acknowledged by the receiver
Segments that are not acknowledged after a certain amount of time are retransmitted.
TCP Connection
Two processes need to set up a TCP connection before exchanging segments. First, the connection is established through endpoints known as sockets. A socket identifies a process and the machine where it runs, tracking the connection state.
A TCP connection between 2 processes, A and B, is established with a 3-ways handshake:
A sends an SYN segment with a random number x
B sends an SYN/ACK segment with a random number y and x+1
A sends the first data with x+2 and y+1
Establishing a connection requires a full round-trip without application data exchange. The same happens for closing a connection. In addition, when a connection is closed, a socket must stay in a waiting state where all received segments are discarded for a certain amount of time. This is to avoid those segments being part of a newly established connection.
This is why it makes sense to keep a connection open between two processes if it's likely another data exchange will follow. Connection pools are usually used for this.
TCP Flow and Congestion Control
TCP implements a reliable communication channel and mechanisms to prevent the sender from overwhelming the receiver and the underlying network from getting flooded.
Flow control is the mechanism preventing the sender from overwhelming the receiver. The receiver store the segments in a dedicated buffer. The buffer size is communicated to the sender with every acknowledgment. The sender uses this information to avoid sending segments that do not fit into the buffer.
Congestion control is the mechanism preventing the network from getting flooded. The sender tracks the number of segments that can be sent without receiving an ACK. This number is known as congestion windows. The window increases when ACK segments are received within a timeout; otherwise decreases.
The smaller the window, the less bandwidth is used. This means that the shorter the round trip time, the sooner the window grows, and the entire bandwidth gets used. This is one of the reasons why it's so critical having web servers close to the clients.
Coding challenge
The coding challenge of last week was Longest Substring Without Repeating Characters. I love such a problem because its optimal solution combines the sliding window technique with the hash set data structure.
The problem gives as input a string and asks to find the length of the longest substring without repeating characters.
The brute force solution is to analyze all the possible substrings and check which is the longest without repeating characters. Enumerating all the possible substrings takes O(N^2), while the check takes O(N). So the overall solution takes O(N^3) time, where N is the length of the string.
Better performance can be achieved using the sliding window technique. The idea is to maintain a window with all unique characters. The right pointer expands the window at each step. In contrast, the left pointer shrinks the window when a repeated character enters it.
A hash set data structure keeps track of all the characters in the window. This allows us to efficiently detect when the right pointer enters a repeated character in the window. In that case, the leftmost characters are dropped until there are no more duplicate characters.
Once the right pointer iterates the whole string, the maximum window length is the problem's solution.
The time complexity is O(N) because the right pointer iterates over the input string and the left pointer never exceeds the right pointer. Instead, the space complexity is O(min(N, L)), where L is the number of possible characters.
Here is the code of this solution.
The challenge for the next week is Simplify Path.
Tech threads of the week
Patterns for solving recursion problems in coding interviews.

Data partitioning in distributed systems.


Interesting tweets of the week
Logarithms are one of the most crucial mathematical concepts to know as developers. An interesting thread will introduce logarithms from a historical point of view, explaining their real-life applications. Definitely, something which will make you appreciate more logarithms and mathematics in general.


Reading technical books is fundamental to improving as a developer. My friend Thiago put together a very well-curated list of technical books. I've read many of them and plan to go through others next year.