PostgreSQL vs LeanXcale
1. Introduction
LeanXcale is a versatile database that combines SQL functionality with NoSQL characteristics. This allows for a range of benefits in flexibility, scalability and maintenance.
PostgreSQL is a powerful open source object-relational database system that uses and extends the SQL language.
In this article we will compare an installation of PostgreSQL at Amazon AWS with RDS Aurora service, against an Cluster of 3 Amazon AWS EC2 instances with LeanXcale installed, using fair workload scenarios. We will show how you can experience marked improvements when choosing LeanXcale over PostgreSQL.
In addition, LeanXcale provides KiVi, a distributed storage engine which provides additional benefits for all workloads.
1.1. LeanXcale KiVi
LeanXcale provides a key-value database called Kivi. KiVi is a low-latency key value datastores that grants fast bulk insertion, update, get and scan operations. It has been designed and implemented after many years of research on distributed systems.
1.2. Benchmarking Tool
YCSB, the Yahoo Cloud Serving Benchmark, an open-source specification and program suite commonly used to compare performance of NoSQL database systems.
1.3. Method
The comparison is made by generating emulated workload with YCSB and letting the YCSB clients connect to the databases over JDBC.
2. Specifications
In the benchmark comparison we are using 4 Amazon AWS instances, 1 for the YCSB clients generating the workload, and 3 for the databases.
2.1. Machines
All machines used are hosted at Amazon AWS. We have requested spot instances and built the LeanXcale database clusters by hand. The machines that we are using are r3.xlarge
, so they have the following specifications:
*Instance Family * |
Instance Type |
Processor Arch |
vCPU |
Memory (GiB) |
Instance Storage (GB) |
EBS-optimized Available |
Network Performance |
Memory optimized |
r3.xlarge |
64 bits |
4 |
30,5 |
1 x 80 |
Yes |
Moderate |
For PostgreSQL we are using a db.r5.xlarge
which is the most similar latest generation machine. With a reader/replica node at different availability zone, this reader is used to manually distribute the reading load.
2.2. Software
All machines are running:
-
Ubuntu 16.04 as SO
-
Java Open JDK 8
-
Python 2
-
LeanXcale 0.96-SNAPSHOT (w/KiVi-0.132)
-
PostgreSQL 10.7
2.3. Execution
YCSB clients use a lot of memory, so YCSB Client processes are running on a separate machine than the ones used for the databases cluster in order to do not disturb the metrics.
YCSB Execution plan
YCSB uses a 2 stage execution plan, with a load stage (clients only execute INSERTS statements in order to get a full working set) followed by a run stage (UPDATES, GETS and SCANS).
|
3. Data Model
The data model obviously will be the same for both databases because the process (YCSB Clients) that will run the benchmark is the same just modifying the connection URL.
The benchmark will be based on a table with 1 string field as primary key, a 10 more string fields filled with random characters.
CREATE TABLE usertable (YCSB_KEY varchar(100) PRIMARY KEY,
FIELD0 varchar(255),
FIELD1 varchar(255), FIELD2 varchar(255),
FIELD3 varchar(255), FIELD4 varchar(255),
FIELD5 varchar(255), FIELD6 varchar(255),
FIELD7 varchar(255), FIELD8 varchar(255),
FIELD9 varchar(255));
3.1. Isolation Level
LeanXcale DB supports two isolation levels: Read Committed (default) and Snapshot Isolation as to provide read consistent view of the database to all transactions. For this benchmark we are using the default Read Committed isolation level.
The Serializable level is out of scope for this benchmark, so Read Committed isolation level is used for PostgreSQL as well.
4. Workloads
The workloads used in the benchmark have been selected to give a fair representation of behavior for both LeanXcale and PostgreSQL, and are not weighted in any way towards or against either.
No auto-commit
The auto-commit function for both LeanXcale and PostgreSQL is disabled. This is because we want to be able to manage operations in batch to improve performance.
|
4.1. Scenarios
In the benchmark, we use 5 workload scenarios, 4 one-operation scenarios, and 1 mixed-operations scenario.
4.1.1. Single
-
Load: pure
INSERT
statements with batch size of 1000 rows, loading 400000 rows. -
Read: pure
SELECT
statements based on the primary key, getting 50000 rows. -
Update: pure
UPDATE
statements based on the primary key, updating 50000 rows. -
Scan: pure
SELECT
statements from a random key till the end of the table, uniform distribution around1000 rows
5. Metrics
In the benchmark we collect Throughput (operations per second) and Average Latency (µs) of the operations launched.
All metrics are measured from the YCSB client side for fairness.
6. LeanXcale Setup
LeanXcale is aimed towards clusters out of the box, so there is no special installation of a clustering tool. All that is required is to update the configuration inventory file to contain the names of the machines on which you are going to install the database.
6.1. JDBC
The LeanXcale JDBC Driver is bundled by default with the distribution package elasticdriver-<version>-jar-with-dependencies.jar
, and is placed in the ycsb-<version>/jdbc-binding/lib
directory.
7. Results
7.1. Throughput results
7.1.1. LOAD operation
For the LOAD
scenario, PostgreSQL performance decreases as we add processes and threads.
LeanXcale performance also decreases, but keeps a better and more stable shape, providing almost double performance at the max client-threads point, due to the request management and flow control mechanisms.
7.1.2. READ operation
In the READ
scenario, PostgreSQL gives good performance in the lower range, but drops off rather quickly as number of requests increases, and at the higher range is overloaded.
LeanXcale performance lags in the lower range, but then continues to improve consistently.
7.1.3. UPDATE operation
In the UPDATE
scenario PostgreSQL performs rather evenly over the range. This is due the monolithic architecture, and the JDBC batching strategy. When an update is issued, the database needs to perform a get operation to check if the key exists and then an update operation of the key if exists.
LeanXcale performance is irregular over the range, and only achieves better performance in the highest range. This is due to the LeanXcale cluster architecture. Reads and inserts are distributed over all machines, which means operations cannot be threaded as a batch, so they have to be executed one by one.
7.1.4. SCAN operation
With the SCAN
scenario, PostgreSQL shows a poor performance profile, even though scans are distributed over nodes both PostgreSQL nodes (writer and reader/replica).
LeanXcale uses performance shows an early dropoff and slow degrading, but at a significantly high level. This is because of the Key-Value datastore and allowing data from scans to stream sequentially. Also the high performance mono-thread architecture of the datasource reduces the context switch while getting the results.
7.2. Latency results
7.2.1. LOAD latency
The LOAD
scenario shows PostgreSQL latency at a consistent and low level. This is due to the monolithic structure of PostgreSQL - all inserts go to a single machine, and the behavior is predictable.
LeanXcale handles it differently, and has to allocate the workload over available nodes, which generates increased latency. On the other hand, this allows for greater throughput.
7.2.2. READ latency
For the READ
scenario, PostgreSQL performs reliably and consistently, again an effect of the monolithic architecture.
LeanXcale, again, has to distribute the work over the nodes, which generates higher latency, but provides better throughput.
7.2.3. UPDATE latency
The UPDATE
scenario shows how PostgreSQL has an irregular profile and has behaves unpredictably.
LeanXcale provides a stable throughput and a significantly better latency at all loads.
8. Total Cost of Ownership
When calculating the total cost of ownership of your database solution it is easy to ignore the impact of performance and latency. Hoever, the actual running costs over time can be very severely impacted by how the database operates.
In this benchmark, the different costs for PostgreSQL and LeanXcale are markedly different.
DB | Insert | Read | Scan |
---|---|---|---|
PostgreSQL |
256.35 |
253.11 |
222.54 |
LeanXcale |
54.135 |
57.293 |
DB | Insert | Read | Scan |
---|---|---|---|
PostgreSQL |
6721.90 |
7187.43 |
93292.89 |
LeanXcale |
12101.15 |
9529.08 |
3404775.94 |
9. Time To Market
When developing your solution you want to move quickly from concept to market delivery. The time to market for any new solution is a very important, yet often overlooked aspect of the development decision process. Moving from development to scaling up to a finished product needs to be a smooth transition. Steep learning curves or slow deployment in the initial development can be as detrimental as complicated migrations or basic technology changes at a late stage, and both can have severely detrimental effects on the bottom line.
The ideal database provides a reasonable learning curve, with a versatile set of options and the possobility of deeper sopistication and specialization. You want to be able to do quickly, and then be able to scale up to production levels of complexity without too much of context switching for the developers.
9.1. TTM conclusions
LeanXcale lets you use the same database for prototyping, initial development, all the way to finished product, deployment and future upscaling. You can keep a single architecture, avoid migrations, and keep your team focused on a reduced set of tools. All of this while still keeping the needed flexibility and ability to change the solution according to market needs.
10. KiVi - LeanXcale Direct Datastore API
By now, the benefits of using LeanXcale should be obvious. However, this is only the case of using a standard JDBC connection.
With KiVi, the LeanXcale Direct Datastore API, you can experience even greater performance increases as resources are added.
11. KiVi - Conclusions
When using KiVi, we skip transactions management and query language processing software. This means that we lose the isolation level, which is very useful for bulk load/read scenarios. The increase in performance is notable.
12. The upsides of LeanXcale
LeanXcale provide major benefits with strong impact on important areas of interest:
a) Short time to market: With LeanXcale, you only need one database for all your needs. This means simpler architecture, fewer technologies to master while being able to easily adapt to changed requirements and workload needs.
b) Low total cost of ownership: We squeeze every single cycle of CPU, so you get better performance of the same machine, paying less at the end of the month for the same (better) service.
c) Extreme scalability: When you achieve success, LeanXcale will be ready to handle any workload, and you won’t need to change your architecture to fit even the most extreme performance demands.
12.1. Suitability
The great flexibility and versatility of LeanXcale makes it perfect for implementation in a startup, without sacrificing the needs of major enterprises. Your database solution can be created quickly and allows for expansion, modification and growth in pace with your other development.
There is no need to make separate decisions for your quick-and-dirty test implementation and your finalized enterprise product. LeanXcale will follow you all the way.