NoSQL – why?
If you are following backend development or Big Data fields, you’ve probably noticed that for the last couple of years, there has been a lot of hype about NoSQL databases.
In this article, I will describe why NoSQL databases were created in the first place, what problems they solve, and why suddenly we need to have so many different databases.
What is wrong
with relational database in the first place. They worked fine for many years, but now we have a obstacle that can’t handle anymore ….
Relational databases are designed to run a single machine and if you need to handle more requests, you have only one option: Buy a bigger computer with more memory and a better CPU. Unfortunately, there is a limit to how many requests a single machine can handle, and we need a different database technology that can run on multiple machines.
Now, some of you may scoff, and say that there are two widespread techniques of how you can use multiple machines if you use a relational database: replication and sharding. But they are not sufficient methods to handle challenges that we are facing.
… is a technique in which every update to your database is propagated to other hosts that can handle only read requests. In this case, all changes are applied by a single host, called the leader, while all other hosts, called read replicas, maintain a copy of the data. A user can read from any machine, but can change data only through the leader host. This is a useful and very popular technique, but it allows only handling of more read requests and doesn’t solve the problem of handling the required amount of incoming data.
Multi-master (read and write part of data)
While sharding allows you to write more data, managing a sharded database can be a nightmare. You have to balance data across machines and scale your cluster up and down when necessary. While it may look simple in theory, implementing it correctly is a major challenge.
How does NoSQL works in this case?
NoSQL databases give you this control to select how your query should be executed. In one way or another, they allow you to specify two parameters when you perform a read or write operation with a NoSQL database:
W – how many machines in a cluster should acknowledge that they have stored your data when you perform a write. The more machines you write data to, the easier it is to read the latest data with the next read, but the more time will it take.
R – from how many machines you want to read data. In a distributed system, it may take some time for data to propagate to all machines in the cluster, so some hosts can have the latest data, while some can still lag behind. The more machines you read data from, the higher your chances of reading the latest data.
Let’s get more practical. If you have five machines in your cluster and you decide to write data to only one machine and then read data from one random machine, you have an 80 percent chance that you will get stale data. On the other hand, you will use a minimum amount of resources and if you can temporary tolerate stale data, you can choose this option. In this case, the W parameter is equal to 1 and R is equal to 1 as well.
On the other hand, if you write data to all five machines in a NoSQL database, at once you can read data from any machine and it is guaranteed that you will get the latest data every single time. It will take longer to perform the same operation with more machines, but if it is important for you, you can do this. In this case, W=5 and R=5.
What are the minimal numbers of reads and writes we need to perform to have a consistent database? Here is a simple formula: R + W ≥ N + 1, where N is the number of machines in the cluster.
Thank for your attention!
Please put more smile in digital world