Arrays
All you need to know about the array data structure, including multidimensional and dynamic arrays.
Hi Friends,
Welcome to the 94th issue of the Polymathic Engineer newsletter.
This week, we talk about about Arrays. Despite Arrays being the most widely used data structure, only a few developers understand how they operate and know their efficiency tradeoffs.
The outline is as follows:
What is an array
Multidimensional Arrays
Advantages and downsides of arrays
Dynamic Arrays
Project-based learning is the best way to build technical skills. CodeCrafters is an awesome platform to practice interesting projects such as building your version of Redis, HTTP server, and even Git from scratch.
Sign up, and become a better software engineer.
What is an array
Arrays are fundamental data structures in computer science since they are building blocks for many more complex structures.
Most high-level languages hide how computers deal with arrays. However, they are so commonly used that knowing how they work under the hood is important.
At its core, an array is a contiguous block of memory that stores elements of the same type. A member from the array can be selected by specifying the member’s array index with an integer. All of the indexes of an array are also numerically contiguous, usually starting from 0.
A nice analogy to think of an array is like a street full of identical houses. Each house (array element) has a unique address (index), and you can compute the exact position of each house immediately from its address.
The base address of an array is the address of the first element of the array, and it is at the lowest memory location. The second array element directly follows the first in memory, and the third element follows the second, and so on.
Since all the elements are of the same type, when you apply an index to an array, you can access the memory address of the corresponding element efficiently in constant time. In the following example, A[i] chooses the i-th element from array A.
You can compute the byte address of any given element of an array using the following formula:
Element_Address = Base_Address + (index * Element_Size)
The Element_Size field indicates the number of bytes each element occupies. If the array contains elements of type byte, the field is equal to 1, simplifying the computation.
Multidimensional Arrays
A multidimensional array lets you select its elements using two or more independent index values.
A classic example is when a two-dimensional array represents an image. One index is a row of pixels, while the other is a column of pixels. The element of the array selected by these two indexes is a specific pixel.
Multidimensional arrays are still stored as a single contiguous block of memory. The actual mapping of positions within the array grid to memory addresses can be done in two different ways.
Row-major ordering assigns array elements to successive memory locations by moving across the rows and then down the column according to this formula:
Element_Address = Base_Address + ((col_index * row_size) + row_index) * Element_Size
Column-major ordering goes the other way around using a similar formula:
Element_Address = Base_Address + ((row_index * col_size) + col_index) *
Element_Size
Most high-level programming languages use row-major ordering.
Advantages and downsides of arrays
As we’ve already observed, the main advantage of using arrays is the constant-time access to arbitrary elements given their indexes.
However, there are other benefits that it’s worth to keep in mind. The first is space efficiency. Since arrays are made up of only data, there is no need to waste memory space to store links or formatting information. For example, information about the end of an element is not needed because all elements have a fixed size.
Another advantage is memory locality. Iterating through all the elements of a data structure is quite common in programming, and this works well with arrays because they have great memory locality. The contiguity between data accesses lets computer architectures make the most of their fast cache memory.
The main downside of arrays is that you cannot adjust their size in the middle of a program’s execution. A workaround would be to allocate extremely large arrays, but this can waste memory space. For this reason, most programming languages have built-in dynamic arrays.
Dynamic Arrays
While standard arrays are fixed in size, dynamic arrays offer more flexibility, as they can grow or shrink as needed.
Here's how dynamic arrays work:
Start with a fixed-size array.
When the array fills up, create a new array twice the size of the original.
Copy all elements from the old array to the new one.
Delete the old array.
This doubling strategy may seem inefficient, but it's pretty clever because most elements are copied infrequently. Half the elements move only once, a quarter move twice, and so on.
On average, each element is moved only twice during the entire growth process, and the total work of managing the dynamic array is still O(n) as if we had allocated the correct size from the start.
Basically, there is a trade-off. Dynamic arrays lose the guarantee that every access is done in constant time (due to occasional resizing operations), but they provide an amortized O(1) time for appending a new element.
Food for thoughts
I worked on train signaling systems with different safety integrity levels. Designing and implementing a SIL4 system from scratch was challenging, but I learned a lot. However, we didn't use a built-in real-time OS like VxWorks; we implemented our own real-time OS. It was a management decision, and I didn't have enough experience at that time to understand all the implications. I later understood how challenging it is to reverse engineer everything to port on a different platform. Link
Time and energy are limited, so it's important to optimize the use of your time as much as possible. In any case, reducing your expectations on side hustles is better than having an unhappy family life.
Storage engines are typically 'embedded' rather than running as a separate server, container, or process to reduce communication overhead with external components. This is critical because it allows for the fastest access to data. Link
Interesting Read
Here are some interesting posts I read this week:
Arrays are where simplicity meets power—constant time access with minimal overhead.
Thanks for the mention, Fernando!
Great intro to Arrays Fernando.
Such a fundamental data structure and so versatile.
Also, thanks for the mention!