

Discover more from The Polymathic Engineer
Key and Values
An overview of key-value databases. Plus the most common storage types in the cloud.
Hi Friends,
Welcome to the 43th edition of the Polymathic Engineer newsletter. This week we will focus on a specific type of non relational database: key-value storages. Plus we will explore the possible way you can use to store data in the cloud.
The outline will be as follows:
what are key-value databases
pros and cons of key-value databases
most popular key-value databases implementations
cloud storage types
interesting tweets
Key-value databases
NoSQL databases are known for their flexibility and performance, but no single type of NoSQL database is suitable for all use cases.
Key-value databases are the simplest type of NoSQL databases and provide a simple and efficient way to store data as key-value pairs.
They work a lot like the hash tables data structures, storing groups of key-value pairs, where keys can point to any value.
Keys are usually simple objects like integers or strings, while values can be more complicated objects such as JSON, lists, BLOB, or arrays.
Key-value databases have no SQL-style query language to describe which values to fetch, and keys are typically the only way to reference data. Only some key-value databases compensate for the lack of a query language by implementing limited search capabilities for values.
Use cases
Key-value databases can be effectively used in the following scenarios:
• Caching: Websites and applications repeatedly use certain pieces of data, and key-value databases allow rapid access to this data. For example, they can store webpages with the URL as the key and the webpage content as the value. This speeds up response times dramatically.
• Configuration data: Software systems often have settings or configurations that must be consistently accessed and possibly updated across the system. A key-value database can save each configuration item with a unique key so the system can efficiently read or update these settings as required.
• Session data: many web applications are session-oriented. The application starts a session when a user logs in, which is active until the user logs out or the session times out. Each session has a unique ID, and all the related information (like user preferences, cart items, or browsing history) can be identified in a key-value database using that ID as a key. When a user returns, the system uses the ID to pull up all the session's associated data instantly.
• Large Volume of small reads and writes: key-value databases are ideal for use cases that require a large volume of small and continuous reads and writes. They can provide fast in-memory access to volatile or frequently changing data. This makes them perfect for applications where performance and low-latency access to data are critical, such as gaming platforms and recommendation engines.
Advantages and disadvantages
Advantages of key-value storages:
• Flexibility and simplicity: key-value databases don't impose any structure on the values. The straightforward commands and the absence of data types make them easy to use.
• Speed: key-value databases typically only keep data in memory and access values through keys. This makes it possible to have lower latency and higher throughput.
• Scalability: compared to SQL systems, they can be scaled up or down quickly. Some key-value storages like DynamoDB are infinitely scalable horizontally.
Weaknesses of key-value storage:
• No filtering: key-value databases consider values as blobs and cannot usually figure out what they contain. This means that when a request is made, the whole values are sent back instead of specific pieces of data. Similarly, when only part of a value needs to be changed, the entire value must be updated.
• No query language: without a unified query language, queries from one database may not apply to a different key-value database. Moreover, values can't be easily filtered or sorted using queries.
Implementations
Popular key-value storage implementations:
• Dynamo DB: it's a fully managed and serverless storage service part of AWS. It offers reliable performance at any scale and comes with built-in security, backup and restore, and caching in memory. An item in DynamoDB is made up of a main or composite key and an unlimited number of attributes. There is no explicit limit on how many attributes can be linked to a single item, but the total size of an item can't be more than 400 KB.
• Zookeeper: a strongly consistent, highly available storage often used for dynamic configuration.
• Redis: it's an in-memory database that teams use for many things: building caches, managing authentication sessions, chat and messaging, and any other use cases that prioritize real-time, rapid data retrieval. Redis is open source, but you can also pay them to host it.
• Etcd: a strongly consistent, highly available storage often used for implementing leader election
Types of cloud storage
There are 3 ways to store data in the cloud: block, object, and file storage. Most systems use all of them for specific purposes. So, it's essential to understand how these forms of storage work and their pros and cons.
Block storage divides data into fixed-sized blocks and stores them as individual units. Each block has a unique address assigned with a logical block addressing scheme, and it is usually available as a volume. Block volumes can be formatted and used as a file system or hand-controlled by applications. Block is the most flexible and performant form of storage, but it is tightly connected to a single server and cannot be shared. The main problem of block storage is that their tight server connection limits scalability.
File storage is a general-purpose solution based on block storage that provides a higher-level abstraction for easier file and directory handling. It is accessible via file-level network protocols such as SMB and NFS, making it perfect for sharing huge files and folders within an organization. File-level access control allows to set up permissions and access control lists (ACLs) to increase security. The main problem with file storage is that they face challenges with managing many small files.
Object storage is the more recently introduced form of storage. It is a high-durable, large-scale, low-cost storage solution for archiving and backup unstructured data. It stores data as flat objects, usually accessible through RESTful API. It is slower than other storage options. AWS S3 and Azure blob storage are well-known object storage implementations. The main problem with file storages is that their slower performance does not fit real-time needs.
Interesting tweets
Entity-relationship diagrams are a great way of visualising such relationships and helpful when designing or redesigning a database. Nothing better than ER diagrams to design and modelling object models for high-level layers. link
All people willing to have a promising career as software engineers should continually strive to improve in these 3 things. Written communication, coding proficiency and ability to deal with people are the 3 pillars.
Domain knowledge and communication skills are more important than technical skills. An average engineer excelling at them will over advance over a more technically skilled enginee who don’t. Link