Hi Friends,
Welcome to the 69th issue of the Polymathic Engineer newsletter.
This week's article is focused on HTTP, a protocol that has been the foundation of data communication on the World Wide Web.
The outline will be as follows:
what is HTTP
Requests
Responses
Caching
HTTP evolution
HTTP/1
HTTP/2
HTTP/3
Introduction
HTTP is the protocol the World Wide Web uses for transmitting data.
The acronym stands for HyperText Transfer Protocol because HTTP was initially created to handle hypertext: text with links (hyperlinks) that connect to other texts or resources like images and videos.
This concept is implemented by HTML (HyperText Markup Language), which structures various web resources and links to form the web pages we interact with daily.
HTTP works at the application layer of the Internet protocol suite and is designed to enable communication between clients and servers. It uses a request-response paradigm.
A client, usually a web browser, sends an HTTP request to the server. This request includes methods like GET or POST and specifies what action should be performed.
The server processes the request and sends back a response. This response contains the request status (like 200 OK or 404 Not Found) and the requested content if the request was successful.
An important point is that HTTP is inherently stateless, meaning each request-response pair is independent. The server does not retain any session information between different requests from the same client.
However, technologies such as cookies have been developed to remember state across sessions.
HTTP Requests
A web server hosts resources like documents, images, or collections of other resources addressable by a single URL.
HTTP requests allow you to Create, Read, Update, or Delete these resources. Each request is stateless and contains all the necessary information to process it.
Request messages are written in plain ASCII text and designed to be simple yet robust. A typical HTTP request message consists of a request line and a set of headers.
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: fr
The request line includes the method (GET, POST, etc.), the URL of the requested resource, and the HTTP version. For example, "GET /somedir/page.html HTTP/1.1" specifies a GET request for a resource located at "/somedir/page.html" using HTTP version 1.1.
The header lines are key-value pairs of metadata providing additional information about the request. Some typical headers are:
Host: Specifies the server's domain name
Connection: Indicates if the transport-layer connection should be closed when receiving the response.
User-Agent: Identifies the client's browser type
Accept-Language: Suggests preferred languages for the response
The most important part of an HTTP request is the method defining the operation on the requested resource. A request method can have two important properties: safety and idempotency.
Safety means the method has no side effects, and the response can be cached.
Idempotency means the method produces the same result no matter how often it is executed.
The most commonly used methods are:
GET: Retrieves a resource without affecting it. It is safe and idempotent and is typically used for actions like navigating web pages or downloading files
POST: Creates a resource, returning its URL. It is neither safe nor idempotent and is used for actions like submitting forms or uploading files. POST requests include a payload body containing the data entered by the user.
PUT: Updates an existing resource. It is idempotent but not safe.
DELETE: Removes a resource. It is idempotent but not safe.
Despite these standard conventions, the actual use of HTTP methods is not written in stone. Developers sometimes use methods like GET for actions that change data, which can lead to unintended consequences.
Implementing HTTP methods according to their intended purposes is crucial to maintaining the integrity and security of web interactions.
HTTP Responses
An HTTP response message's structure is similar to a request's.
HTTP/1.1 200 OK
Connection: close
Date: Tue, 18 Aug 2015 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 18 Aug 2015 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...)
The Status Line includes the HTTP version, a status code, and a text description indicating whether the request was successful.
There are different code ranges: 200-299 indicate success, 300-399 communicate a redirection, 400-499 communicate client errors, and 500-599 indicate client errors.
The header lines provide metadata about the response, such as:
Connection: Typically mirrors the request's connection directive.
Date: The date and time the response was sent.
Server: Specifies the software used by the server.
Last-Modified: Shows when the content was last changed, which is crucial for caching mechanisms.
Content-Length: The size of the response body in bytes.
Content-Type: Indicates the format of the returning data (e.g., "text/html").
Content-Encoding: Specifies the compression method used, which is essential for data decompression like 'gzip'’
HTTP Caching
Web servers host two types of resources: static resources, such as HTML documents, images, and CSS files, which contain data that doesn't change between requests, and dynamic resources, which are generated by the server on the fly for each request.