Hi Friends,
Welcome to the 61st issue of the Polymathic Engineer newsletter.
This week, we discuss a fundamental aspect of modern API design that addresses the challenges of handling large datasets: API pagination.
The outline will be as follows:
What is API pagination
Why API pagination
Offset-based pagination
Keyset-based pagination
Cursor-based pagination
Best Practices
Post that made me think
What is APIs pagination
API pagination is a technique that allows to design APIs in a way that make easier to retrieve of data in large data sets. Instead of returning all the data in a single response, pagination divides the data into smaller, more digestible chunks, known as "pages."
Each page contains a predefined number of records, allowing API consumers to request subsequent pages to access additional data incrementally.
Pagination typically utilizes parameters to define the size and position of the data subset for each page. These parameters effectively specify where a page starts and how many records it includes, enabling precise control over data retrieval.
Why is API Pagination Important
Using API pagination adds some complexity but brings many benefits:
Improved Performance: by breaking down data into manageable pages, API pagination significantly reduces response times and makes API calls more efficient.
Reduced Resources. Retrieving a subset of data minimizes the server load and the required network bandwidth, ensuring swift and smooth data exchanges.
Better User Experience. Paginated APIs enable rendering data faster and navigating large datasets incrementally. This avoids overwhelming users with excessive information at once, allowing a more intuitive and user-friendly interaction with the applications.
Scalability and Flexibility. As data volumes grow, paginated APIs remain robust, preventing system overloads and maintaining efficient data retrieval across various devices and use cases.
Error Handling. In paginated systems, if data retrieval errors occur, they are isolated to specific pages. This simplifies error recovery, ensuring that only the affected page needs reprocessing.
Offset-based pagination
The API uses two parameters: "offset" determines the position in the dataset, while "limit" specifies the maximum number of data to include on each page. This approach became popular with apps using SQL, which already have LIMIT and OFFSET as part of the syntax.
For example, an API request could include parameters like "offset=40" and "limit=20" to retrieve the records from 21 to 40.
Example: GET products?limit=20&offset40
Pros: It's simple to set up and lets you go straight to any page.
Cons: There are performance issues with large OFFSET values. It can cause data to be duplicated or skipped in dynamic datasets.
Page-based pagination
The API uses a "page" parameter to specify the desired page number. The API responds with the corresponding page and other metadata like the total number of pages or total record count.
Example: GET /products?page=20&limit=20
Pros: It is more predictable than offset-based, eliminating the chances of navigating to nonexistent pages.
Cons: It has the other drawback of offset-based pagination when dealing with large and dynamic data sets.
Keyset-based pagination
This approach relies on sorting the data and using a unique attribute or key (like since_id) to determine the starting point for retrieving the next page.
The first request doesn't contain the key parameter. The other requests use the last key value from the previous set.
Example: GET /products?limit=20since_id=20
Pros: It is more efficient than offset-based and ensures efficient retrieval of consecutive pages without duplication or missing records, even in dynamic data sets.
Cons: It is tied to the sorting order and does not allow going straight to a specific page.
Cursor-based pagination
This approach uses a cursor or pointer to move through the data. Think of a cursor as a backend-determined bookmark that marks a specific record in the data set.
Each request not only retrieves data but also provides a cursor pointing to the start of the next data segment.
Example: GET /products?limit=20cursor=abcd
Pros: It is efficient and flexible, so it’s ideal for working with large data sets even if they don’t have sequentially sortable data. It also helps reduce issues with data consistency for dynamic data sets.
Cons: It is more complex to set up and doesn't allow direct access to specific pages, but only bidirectional navigation from the current page.
Pagination best practices
To maximize the benefits of API pagination, it's essential to consider a set of best practices:
Consistent naming conventions. Use standardized naming for pagination parameters, such as
offset
andlimit
orpage
andsize
. Consistency in parameter naming makes your API intuitive and easier to use for other developers.Include Pagination Metadata in Responses. Metadata includes information like the total number of records, the current page number, the total pages, and links to the next and previous pages. Such information allows to navigate the dataset more effectively.
Optimize Page Size. Choosing the page size is a matter of trade-offs. Smaller sizes can lead to quicker response times and reduced server load, but larger sizes minimize the requests needed to traverse a dataset. Consider the nature of the data and the user needs when setting the default page size, and provide the option for users to specify a custom page size within reasonable limits.
Sorting and Filtering. Providing sorting and filtering capabilities allows users to retrieve data in a meaningful order and focus on what's most relevant to them.
Ensure Pagination Stability. Adding or removing records should not affect the data sequence presented to the user in paginated responses.
Edge Cases and Errors. Be prepared to handle scenarios such as requests for pages beyond the available data range or invalid parameters. Providing clear and informative error messages is essential.
Caching Strategies. Caching paginated responses can significantly improve the efficiency of your API by reducing server load and speeding up response times for frequently accessed data. Consider various caching strategies and mechanisms like HTTP ETags or Last-Modified headers for conditional caching.
Posts that made me think
Don't be the developer who makes things sound as complicated as possible to look smarter. Be the one who tries to simplify things to make them as easy as possible for others.
The worst way to do a behavioral interview is not to prepare in advance good answers to such common questions.
Hello, there's no error in your first example? offset=40 means that the data is retrieved from 41, but in your example you put 21.
Thank you so much for this, please keep going! This article was so simple and easy to understand.