PostgreSQL vs LeanXcale

1. Introduction

LeanXcale is a versatile database that combines SQL functionality with NoSQL characteristics. This allows for a range of benefits in flexibility, scalability and maintenance.

PostgreSQL is a powerful open source object-relational database system that uses and extends the SQL language.

In this article we will compare an installation of PostgreSQL at Amazon AWS with RDS Aurora service, against an Cluster of 3 Amazon AWS EC2 instances with LeanXcale installed, using fair workload scenarios. We will show how you can experience marked improvements when choosing LeanXcale over PostgreSQL.

In addition, LeanXcale provides KiVi, a distributed storage engine which provides additional benefits for all workloads.

1.1. LeanXcale KiVi

LeanXcale provides a key-value database called KiVi. KiVi is a low-latency key value datastores that grants fast bulk insertion, update, get and scan operations. It has been designed and implemented after many years of research on distributed systems.

1.2. Benchmarking Tool

YCSB, the Yahoo Cloud Serving Benchmark, an open-source specification and program suite commonly used to compare performance of NoSQL database systems.

1.3. Method

The comparison is made by generating emulated workload with YCSB and letting the YCSB clients connect to the databases over JDBC.

1.4. Scaling

For a more realistic application, we will take a look at behavior for clusters, to show how LeanXcale shines as the needs of the application grows. And we will show how LeanXcale and PostgreSQL clusters behave against the same workloads.

1.5. Setup

As this is a benchmark article and not a tutorial, we will not go through in detail how to set up the databases and clusters. We do point out relevant differences where needed.

2. Specifications

In the benchmark comparison we are using 4 Amazon AWS instances, 1 for the YCSB clients generating the workload, and 3 for the databases.

2.1. Machines

All machines used are hosted at Amazon AWS. We have requested spot instances and built the LeanXcale database clusters by hand. The machines that we are using are r3.xlarge, so they have the following specifications:

*Instance Family   *

Instance Type

Processor Arch

vCPU

Memory (GiB)

Instance Storage (GB)

EBS-optimized Available

Network Performance

Memory optimized

r3.xlarge

64 bits

4

30,5

1 x 80

Yes

Moderate

For PostgreSQL we are using a db.r5.xlarge which is the most similar latest generation machine. With a reader/replica node at different availability zone, this reader is used to manually distribute the reading load.

2.2. Software

All machines are running:

  • Ubuntu 16.04 as SO

  • Java Open JDK 8

  • Python 2

  • LeanXcale 0.96-SNAPSHOT (w/KiVi-0.132)

  • PostgreSQL 10.7

2.3. Execution

YCSB clients use a lot of memory, so YCSB Client processes are running on a separate machine than the ones used for the databases cluster in order to do not disturb the metrics.

YCSB Execution plan
YCSB uses a 2 stage execution plan, with a load stage (clients only execute INSERTS statements in order to get a full working set) followed by a run stage (UPDATES, GETS and SCANS).

3. Data Model

The data model obviously will be the same for both databases because the process (YCSB Clients) that will run the benchmark is the same just modifying the connection URL.

The benchmark will be based on a table with 1 string field as primary key, a 10 more string fields filled with random characters.

Example 1. Table creation
CREATE TABLE usertable (YCSB_KEY varchar(100) PRIMARY KEY,

FIELD0 varchar(255),

FIELD1 varchar(255), FIELD2 varchar(255),

FIELD3 varchar(255), FIELD4 varchar(255),

FIELD5 varchar(255), FIELD6 varchar(255),

FIELD7 varchar(255), FIELD8 varchar(255),

FIELD9 varchar(255));

3.1. Isolation Level

LeanXcale DB supports two isolation levels: Read Committed (default) and Snapshot Isolation as to provide read consistent view of the database to all transactions. For this benchmark we are using the default Read Committed isolation level.

The Serializable level is out of scope for this benchmark, so Read Committed isolation level is used for PostgreSQL as well.

4. Workloads

The workloads used in the benchmark have been selected to give a fair representation of behavior for both LeanXcale and PostgreSQL, and are not weighted in any way towards or against either.

No auto-commit
The auto-commit function for both LeanXcale and PostgreSQL is disabled. This is because we want to be able to manage operations in batch to improve performance.

4.1. Scenarios

In the benchmark, we use 5 workload scenarios, 4 one-operation scenarios, and 1 mixed-operations scenario.

4.1.1. Single

  • Load: pure INSERT statements with batch size of 1000 rows, loading 400000 rows.

  • Read: pure SELECT statements based on the primary key, getting 50000 rows.

  • Update: pure UPDATE statements based on the primary key, updating 50000 rows.

  • Scan: pure SELECT statements from a random key till the end of the table, uniform distribution around 1000 rows

4.1.2. Mixed

  • Read + Update: mixed SELECT with primary key, and UPDATE statements, 25000 rows for each one.

4.2. Clients

For good workload result, we vary the number of YCSB clients and number of threads per client:

  • Number of clients: 10, 23, 35, 50

  • Number of Threads: 10, 23, 35, 50

All possible combinations are used in order to get a reasonable distribution.

5. Metrics

In the benchmark we collect Throughput (operations per second) and Average Latency (µs) of the operations launched.

All metrics are measured from the YCSB client side for fairness.

6. LeanXcale Setup

LeanXcale is aimed towards clusters out of the box, so there is no special installation of a clustering tool. All that is required is to update the configuration inventory file to contain the names of the machines on which you are going to install the database.

6.1. JDBC

The LeanXcale JDBC Driver is bundled by default with the distribution package elasticdriver-<version>-jar-with-dependencies.jar, and is placed in the ycsb-<version>/jdbc-binding/lib directory.

6.2. Automatic clustering

When the driver is in place there is no need to change anything else. All the cluster configuration and load distribution is automated by LeanXcale.

6.3. YCSB clients

When the cluster is up and runnning, YCSB Clients can be started, using the PostgreSQL JDBC client as interface.

7. Results

7.1. Throughput results

7.1.1. LOAD operation

PostgreSQL LOAD Throughput

Throughput postgreSQL

For the LOAD scenario, PostgreSQL performance decreases as we add processes and threads.

LeanXcale performance also decreases, but keeps a better and more stable shape, providing almost double performance at the max client-threads point, due to the request management and flow control mechanisms.

7.1.2. READ operation

PostgreSQL READ Throughput

Throughput postgreSQL

In the READ scenario, PostgreSQL gives good performance in the lower range, but drops off rather quickly as number of requests increases, and at the higher range is overloaded.

LeanXcale performance lags in the lower range, but then continues to improve consistently.

7.1.3. UPDATE operation

PostgreSQL UPDATE Throughput

Throughput postgreSQL

In the UPDATE scenario PostgreSQL performs rather evenly over the range. This is due the monolithic architecture, and the JDBC batching strategy. When an update is issued, the database needs to perform a get operation to check if the key exists and then an update operation of the key if exists.

LeanXcale performance is irregular over the range, and only achieves better performance in the highest range. This is due to the LeanXcale cluster architecture. Reads and inserts are distributed over all machines, which means operations cannot be threaded as a batch, so they have to be executed one by one.

7.1.4. SCAN operation

PostgreSQL SCAN Throughput

Throughput postgreSQL

With the SCAN scenario, PostgreSQL shows a poor performance profile, even though scans are distributed over nodes both PostgreSQL nodes (writer and reader/replica).

LeanXcale uses performance shows an early dropoff and slow degrading, but at a significantly high level. This is because of the Key-Value datastore and allowing data from scans to stream sequentially. Also the high performance mono-thread architecture of the datasource reduces the context switch while getting the results.

7.1.5. READ_UPDATE throughput

PostgreSQL READ_UPDATE Throughput

Throughput postgreSQL

The READ_UPDATE scenario, PostgreSQL has a similar profile to the single UPDATE scenario.

LeanXcale behavior is a mix of UPDATE and READ. The profile looks like an UPDATE but with the performance increase of the READ profile.

7.2. Latency results

7.2.1. LOAD latency

LOAD Latency

latency load

The LOAD scenario shows PostgreSQL latency at a consistent and low level. This is due to the monolithic structure of PostgreSQL - all inserts go to a single machine, and the behavior is predictable.

LeanXcale handles it differently, and has to allocate the workload over available nodes, which generates increased latency. On the other hand, this allows for greater throughput.

7.2.2. READ latency

READ Latency

latency read

For the READ scenario, PostgreSQL performs reliably and consistently, again an effect of the monolithic architecture.

LeanXcale, again, has to distribute the work over the nodes, which generates higher latency, but provides better throughput.

7.2.3. UPDATE latency

UPDATE Latency

latency update

The UPDATE scenario shows how PostgreSQL has an irregular profile and has behaves unpredictably.

LeanXcale provides a stable throughput and a significantly better latency at all loads.

7.2.4. SCAN latency

SCAN Latency

latency scan

For the SCAN scenario, the PostgreSQL behaviour is taken to the extreme, making it virtually unusable.

LeanXcale keeps a constant controlled latency, due to the flow control mechanism and the scan-streaming processes.

7.2.5. READ_UPDATE latency

READ_UPDATE Latency

latency read update

In the READ_UPDATE scenario, PostgreSQL latency is highly irregular, but follows an increasing trend.

LeanXcale keeps a stable profile, and provides an order of magnitude better latency.

8. Total Cost of Ownership

When calculating the total cost of ownership of your database solution it is easy to ignore the impact of performance and latency. Hoever, the actual running costs over time can be very severely impacted by how the database operates.

In this benchmark, the different costs for PostgreSQL and LeanXcale are markedly different.

Table 1. Infrastructure cost USD/Bopm
DB Insert Read Scan

PostgreSQL

256.35

253.11

222.54

LeanXcale

54.135

57.293

Table 2. Operations (Ops/s (avg)
DB Insert Read Scan

PostgreSQL

6721.90

7187.43

93292.89

LeanXcale

12101.15

9529.08

3404775.94

8.1. TCO conclusions

Due to performance and latency efficiencies, LeanXCale comes out to about 5 times less costly than the PostgreSQL solution.

9. Time To Market

When developing your solution you want to move quickly from concept to market delivery. The time to market for any new solution is a very important, yet often overlooked aspect of the development decision process. Moving from development to scaling up to a finished product needs to be a smooth transition. Steep learning curves or slow deployment in the initial development can be as detrimental as complicated migrations or basic technology changes at a late stage, and both can have severely detrimental effects on the bottom line.

The ideal database provides a reasonable learning curve, with a versatile set of options and the possobility of deeper sopistication and specialization. You want to be able to do quickly, and then be able to scale up to production levels of complexity without too much of context switching for the developers.

9.1. TTM conclusions

LeanXcale lets you use the same database for prototyping, initial development, all the way to finished product, deployment and future upscaling. You can keep a single architecture, avoid migrations, and keep your team focused on a reduced set of tools. All of this while still keeping the needed flexibility and ability to change the solution according to market needs.

10. KiVi - LeanXcale Direct Datastore API

By now, the benefits of using LeanXcale should be obvious. However, this is only the case of using a standard JDBC connection.

With KiVi, the LeanXcale Direct Datastore API, you can experience even greater performance increases as resources are added.

10.1. Throughput with KiVi

LOAD Throughput

throughput load

READ Throughput

throughput read

SCAN Throughput

throughput scan

UPDATE Throughput

throughput update

READ_UPDATE Throughput

throughput read update

11. KiVi - Conclusions

When using KiVi, we skip transactions management and query language processing software. This means that we lose the isolation level, which is very useful for bulk load/read scenarios. The increase in performance is notable.

11.1. Thread peaks

While performance does grow continuously, there are local peaks (at 10 threads) as we increase the number of clients. This means that we are still far away from saturation point for the given machines.

11.2. Single-thread processing

The LeanXcale datastore uses high-performance single-thread processes in order to reduce the amount of context switch. The drawback is that as the number of threads increases, each thread is less effective. Still, the overall effect is increased performance.

12. The upsides of LeanXcale

LeanXcale provide major benefits with strong impact on important areas of interest:

a) Short time to market: With LeanXcale, you only need one database for all your needs. This means simpler architecture, fewer technologies to master while being able to easily adapt to changed requirements and workload needs.

b) Low total cost of ownership: We squeeze every single cycle of CPU, so you get better performance of the same machine, paying less at the end of the month for the same (better) service.

c) Extreme scalability: When you achieve success, LeanXcale will be ready to handle any workload, and you won’t need to change your architecture to fit even the most extreme performance demands.

12.1. Suitability

The great flexibility and versatility of LeanXcale makes it perfect for implementation in a startup, without sacrificing the needs of major enterprises. Your database solution can be created quickly and allows for expansion, modification and growth in pace with your other development.

There is no need to make separate decisions for your quick-and-dirty test implementation and your finalized enterprise product. LeanXcale will follow you all the way.