Database categories
A taxonomy of the most used types of database available on the market: relational, NoSQL, time-series and NewSQL.
Hi Friends,
Welcome to the 119th issue of the Polymathic Engineer newsletter.
One of the most important choices you have to make when building an application is which database to use. There are many different database options out there, and each one has its pros and cons. If you don't know what makes them different, selecting the ones that best fit your project's needs is almost impossible.
On top of that, more complex applications use more than one database, with each one handling a different set of tasks. While this approach lets you leverage the strengths of each database type and optimize your data management strategy, it also makes the overall decision process harder.
In this article, we will talk about the four main types of databases: relational, NoSQL (document, column, key-value, and graph), time-series, and NewSQL databases.
The goal is to discuss their most important features, how they can be used, and the pros and cons of each to help you make an informed choice for your next project.
Project-based learning is the best way to develop technical skills. CodeCrafters is an excellent platform for practicing exciting projects, such as building your version of Redis, Kafka, DNS server, SQLite, or Git from scratch.
Sign up, and become a better software engineer.
Relational Databases
For many years, relational databases have been the main way that data is stored, and for good reason.
They are based on the relational model, which sets up data into tables. Each table is made of rows that show data records as tuples and columns that represent attributes of those records.
Each row has a primary key that makes it unique, while foreign keys connect tables so that you can quickly query related data even if they have complex relationships.
Tables follow a fixed schema, which makes it easier to handle and keep up with a large amount of structured data. The schema defines what kinds of data are in each column, what limits they have, and how they relate to each other. The database makes sure that each row follows the format that was set up front.
Structured Query Language can support complex queries and data manipulations and is the de facto standard for relational databases. SQL is a declarative language, which means you tell the database engine what data you want, and it figures out how to get it. The engine makes use of indexing techniques and query optimization strategies to improve performance and use fewer resources.
Transactions are another area where relational databases really shine. They make sure that a series of operations are carried out in a reliable and consistent way. Relational databases provide support for ACID transactions. Here is what each part of the acronym means:
Atomicity: all operations are treated as a single unit, and they all win or fail at the same time. If something goes wrong, the changes are undone.
Consistency: once a transaction is done, any application-level state stays the same.
Isolation: multiple transactions can run at the same time without any race conditions.
Durability: once a transaction is committed, its results are permanent, even in the event of a system crash.
Despite all these features, relational databases are not perfect and may not always be ideal for your use cases.
The predefined schema can make it hard to adapt to different requirements. Usually, changing the schema means making big changes to data and applications already in use. Relational databases are specifically designed for structured data, and are not ideal for managing unstructured or semi-structured data like social media posts or sensor data.
Moreover, it can be tough to scale these databases horizontally by adding more nodes, especially when compared to some NoSQL options. Executing joins and maintaining ACID properties is much more challenging in a distributed environment.
If you want to use a relational database, you should consider what you need it for. You should think about things like how consistent your data has to be, how complex your queries are, and how much you need to scale. Some popular relational databases are MySQL, PostgreSQL, Microsoft SQL Server, and Oracle.
NoSQL Databases
As we've seen, relational databases aren't always the ideal choice when it comes to speed, scalability, and flexibility. This is the reason why NoSQL systems were made.
Such a family of databases doesn't strictly use traditional table-based storage. They can store data in various formats, which makes them suitable for a wide range of use cases.
NoSQL databases come in four main types: document-based, column-based, key-value, and graph.
Document-based databases use formats like JSON or BSON to store data in partially structured documents. This approach has two main benefits.
First, you can store data that is related in a single document, making read operations faster and reducing the need for complicated joins. This works well for hierarchical or nested data structures used by content management systems, e-commerce platforms, and analytics.