DynamoDB vs LeanXcale
1. Introduction
LeanXcale is a versatile database that combines SQL functionality with NoSQL characteristics. This allows for a range of benefits in flexibility, scalability and maintenance.
NoSQL databases are widely used and key-value datastore is probably the most used among all the different types. The key-value databases are designed for storing, retrieving and managing associative arrays.
The records are stored and retrieved using a key that uniquely identifies the record and it used to quickly find the data within the database. These databases are used in many scenarios, such as cache matching, real time prizing or IoT.
Amazon DynamoDB is a popular key-value database allowing an easy to use system that is fully managed and integrated within Amazon Web Services (AWS).
1.1. LeanXcale KiVi
LeanXcale provides a key-value database engine called KiVi. KiVi is a low-latency key value datastores that grants fast bulk insertion, update, get and scan operations. It has been designed and implemented after many years of research on distributed systems.
1.2. Benchmark
In this article we compare performance for LeanXcale using KiVi, a Java API to connect directly to the datastore engine with that of DynamoDB, using fair workload scenarios. We will show how you can experience marked improvements when choosing LeanXcale over DynamoDB.
1.3. Benchmarking Tool
YCSB, the Yahoo Cloud Serving Benchmark, an open-source specification and program suite commonly used to compare performance of NoSQL database systems.
2. Data Model
The data model obviously will be the same for both databases because the process (YCSB Clients) that will run the benchmark are the same just modifying the connection URL.
The benchmark will be based on a table with 1 string field as primary key, a 10 more string fields filled with random characters.
CREATE TABLE usertable (YCSB_KEY varchar(100) PRIMARY KEY,
FIELD0 varchar(255),
FIELD1 varchar(255), FIELD2 varchar(255),
FIELD3 varchar(255), FIELD4 varchar(255),
FIELD5 varchar(255), FIELD6 varchar(255),
FIELD7 varchar(255), FIELD8 varchar(255),
FIELD9 varchar(255));
3. Workloads
The workloads used in the benchmark have been selected to give a fair representation of behavior for both LeanXcale and DynamoDB, and are not weighted in any way towards or against either.
No auto-commit
The auto-commit function for LeanXcale is disabled. This is because we want to be able to manage operations in batch to improve performance.
|
3.1. Scenarios
In the benchmark, we use 5 workload scenarios, 4 one-operation scenarios, and 1 mixed-operations scenario.
3.1.1. Single
-
Load: pure
INSERT
statements with batch size of 1000 rows:-
LeanXcale:
400000 rows
-
DynamoDB:
40000 rows
-
-
Read: pure
SELECT
statements based on the primary key, getting50000 rows
. -
Update: pure
UPDATE
statements based on the primary key, updating50000 rows
. -
Scan: pure
SELECT
statements from a random key till the end of the table, uniform distribution around1000 rows
4. Metrics
In the benchmark we collect Throughput (operations per second) and Average Latency (µs) of the operations launched.
All metrics are measured from the YCSB client side for fairness.
5. Setup
5.1. DynamoDB Setup
In order to get a fair benchmark, we need to configure DynamoDB in order to have a comparable comparable performance to the KiVi installation:
\$text(write operations) = 35000 text( ops/sec)\$
\$text(read operations) = 49000 text( ops/sec)\$
5.1.1. Dimensioning Amazon DynamoDB
DynamoDB has an option to deploy an instance with a specific provisioned read and write capacity blocking the auto-scalability functionalities of the system. This option is used to get a DynamoDB instance with enough capacity to run the YCSB benchmark.
In order to find the most appropriated values for the provisioned capacity, we run KiVi on the c5d.xlarge AWS instances. This instance has 4 vCPUs and 8GB of ram; however, only 6GB are assigned to KiVi and the rest is reserved to the operating system.
We run the YCSB client on c5.2xlarge AWS instances (8vCPUs and 16GB of memory) using different YCSB client instance and thread numbers. The workload used for this benchmark is 49,99% read, 50% write and 0.01% scan operations, retrieving 2.000 tuples each scan. In YCSB, each tuple has an average size of 1,1KB, and the key is a string. The evaluation starts populating the database with 400.000 tuples, and the benchmark runs 500.000 operations in total. The benchmark is run with 1, 2, 4, 6 and 10 different YCSB client instances and 1, 2, 5, 10, 20 and 30 threads.
Read |
49,99% |
Update |
20% |
Scan |
0,01% |
Insert |
10% |
Read and Modify |
20% |
With this, we can compute the DynamoDB capacity required to run the YSCB benchmark.
\$text(write operations) = 35000 text( ops/sec) * 50% = 17500 text(ops/sec)\$
The YSCB tuple size is 1,1KB, this means that DynamoDB needs at least 2 times the number of operations computed before.
5.2. LeanXcale Setup
LeanXcale is aimed towards clusters out of the box, so there is no special installation of a clustering tool. All that is required is to update the configuration inventory file to contain the names of the machines on which you are going to install the database.
5.2.1. JDBC
The LeanXcale JDBC Driver is bundled by default with the distribution package elasticdriver-<version>-jar-with-dependencies.jar
, and is placed in the ycsb-<version>/jdbc-binding/lib
directory.
6. Results
6.1. Throughput results
The following chart shows the throughput obtained with KiVi and DynamoDB using the YCSB benchmark.
We can see that, even with the DynamoDB provisioned capacity, YCSB is not able to run with more than 16500 YCSB ops / sec in the best scenario. We have also check that the capacity of the DynamoDB instance is not fully used. That is due to the latency that YCSB has.
With Kivi, we got more than 35.000 YCSB ops/sec. With KiVi we get the maximum performance of the system with YCSB just with 5 threads per clients, this is due to its low latency. The YCSB clients run synchronous operations for each thread. Consequently, if the operation takes less time, it will take that thread less time to run the next operation. With DynamoDB we get the maximum latency with 10 or even 20 threads per client.
6.2. Latency - YCSB Response time
The second objective of this benchmark is to compare the latency of the operations with DynamoDB and with Kivi. We will show the average response time, the 95-percentile and the 99th percentile. With the 99th percentile we will see if the system is stable and it does not give high latencies depending on current load of the system.
This will show the stability of both systems. In these charts we can see that the percentiles of both systems are good. In both cases, the 99thpercentile is around 4 or 5 times more than the response time. In addition, the percentile 95th is similar to the average in most cases. That means that most of the queries will have a normal latency, given by the average.
However, the latency of KiVi is much better than DynamoDB with all the queries.
6.2.1. READ operation
For read operations, KiVi is around 10 times faster than DynamoDB, giving response times lower than 0,5ms in all cases. DynamoDB has one-digit millisecond latencies, but KiVi is responding in less than 50us.
\$text(read operations) = 49000 text( ops/sec)\$
6.2.2. INSERT operations
Something similar happens with insert and updates. Both operations in YCSB are managed like inserts for Key-Value databases, in fact, the response time is similar for insert and update operations.
The response time of KiVi is really low, this is due to the cache system that KiVi has. KiVi pre-stores all the new tuples before traversing the tree and putting them it their right place.
6.2.3. SCAN operations
KiVi is around 10 times faster than DynamoDB for scan operations. All the scans chose a random key to start and read the following 2.000 tuples of the database. The main reason for this is that DynamoDB is not optimized for scan operations, while KiVi is ready for scan operations even applying filters or aggregations.
7. Total Cost of Ownership
When calculating the total cost of ownership of your database solution it is easy to ignore the impact of performance and latency. Hoever, the actual running costs over time can be very severely impacted by how the database operates.
In this benchmark, the different costs for DynamoDB and LeanXcale are in definite contrast:
DB | ops | cost/month |
---|---|---|
DynamoDB |
~49000 READ/s 35000 WRITES/s |
$21665 |
LeanXcale |
35000 |
$514 |
8. Time To Market
When developing your solution you want to move quickly from concept to market delivery. The time to market for any new solution is a very important, yet often overlooked aspect of the development decision process. Moving from development to scaling up to a finished product needs to be a smooth transition. Steep learning curves or slow deployment in the initial development can be as detrimental as complicated migrations or basic technology changes at a late stage, and both can have severely detrimental effects on the bottom line.
The ideal database provides a reasonable learning curve, with a versatile set of options and the possobility of deeper sopistication and specialization. You want to be able to do quickly, and then be able to scale up to production levels of complexity without too much of context switching for the developers.
8.1. TTM conclusions
LeanXcale lets you use the same database for prototyping, initial development, all the way to finished product, deployment and future upscaling. You can keep a single architecture, avoid migrations, and keep your team focused on a reduced set of tools. All of this while still keeping the needed flexibility and ability to change the solution according to market needs.
8.2. Total cost of ownership
A very important factor to take into consideration of any databse solution is the impact of performance on cost.
8.3. KiVi
The cost of KiVi depends mainly on the AWS instance used. For this benchmark, we have used the c5d.xlarge that has 100GB of storage. The total cost of the instance is 142,8$/month. DynamoDB offers a replicated system.
KiVi has a costless (regarding latency) replication, but it is required to have another instance for that purpose. LeanXcale is a license-based product. It cost with AWS is 75% extra to the price of the AWS instance. In total, the KiVi deployment costs 499,97 $/month.
9. The upsides of LeanXcale
LeanXcale provide major benefits with strong impact on important areas of interest:
a) Short time to market: With LeanXcale, you only need one database for all your needs. This means simpler architecture, fewer technologies to master while being able to easily adapt to changed requirements and workload needs.
b) Low total cost of ownership: We squeeze every single cycle of CPU, so you get better performance of the same machine, paying less at the end of the month for the same (better) service.
c) Extreme scalability: When you achieve success, LeanXcale will be ready to handle any workload, and you won’t need to change your architecture to fit even the most extreme performance demands.
9.1. Suitability
The great flexibility and versatility of LeanXcale makes it perfect for implementation in a startup, without sacrificing the needs of major enterprises. Your database solution can be created quickly and allows for expansion, modification and growth in pace with your other development.
There is no need to make separate decisions for your quick-and-dirty test implementation and your finalized enterprise product. LeanXcale will follow you all the way.