Distributed aka Multi-Node Database

The solution is to deploy a database with more nodes to handle the load. The code is actually the same, only the deployment changes.

Problem:

There is one topic we have not discussed yet. It is how data is distributed across nodes/storage servers. Initially, we set the distribution across nodes/storage servers using hashing. This is quite convenient. Data ingestion is spread evenly across all storage servers across all nodes. However, there is an important tradeoff: now queries will hit all storage servers, even if the query reads a single row based on its primary key. This makes queries too expensive.

Solution:

The solution is basically to split tables based on the primary key. This way we can distribute the data without making the queries more expensive. This means that both reads and writes are as efficient as possible.