MariaDB vs LeanXcale

1. Introduction

LeanXcale is a versatile database that combines SQL functionality with NoSQL characteristics. This allows for a range of benefits in flexibility, scalability and maintenance.

In this article we compare performance for LeanXcale with that of MariaDB, using fair workload scenarios. We will show how you can experience marked improvements when choosing LeanXcale over MariaDB.

In addition, LeanXcale provides KiVi, a distributed storage engine which provides additional benefits for all workloads.

1.1. Benchmarking Tool

YCSB, the Yahoo Cloud Serving Benchmark, an open-source specification and program suite commonly used to compare performance of NoSQL database systems.

1.2. Method

The comparison is made by generating emulated workload with YCSB and letting the YCSB clients connect to the databases over JDBC.

1.3. Scaling

For a more realistic application, we will take a look at behavior for clusters, to show how LeanXcale shines as the needs of the application grows. And we will show how LeanXcale and MariaDB clusters behave against the same workloads.

1.4. Setup

As this is a benchmark article and not a tutorial, we will not go through in detail how to set up the databases and clusters. We do point out relevant differences where needed.

2. Machine Specifications

In the benchmark comparison we are using 4 Amazon AWS instances, 1 for the YCSB clients generating the workload, and 3 for the databases.

2.1. Execution

YCSB clients use a lot of memory, so YCSB Client processes are running on a separate machine than the ones used for the databases cluster in order to do not disturb the metrics.

YCSB Execution plan
YCSB uses a 2 stage execution plan, with a load stage (clients only execute INSERTS statements in order to get a full working set) followed by a run stage (UPDATES, GETS and SCANS).

2.2. Machines

All machines used are hosted at Amazon AWS. We have requested spot instances and built the database clusters by hand. The machines that we are using are “r3.xlarge”, so they have the following specs:

*Instance Family   *

Instance Type

Processor Arch

vCPU

Memory (GiB)

Instance Storage (GB)

EBS-optimized Available

Network Performance

Memory optimized

r3.xlarge

64 bits

4

30,5

1 x 80

Yes

Moderate

2.3. Software

The machine software specifications are as follows:

  • Ubuntu 16.04 as SO

  • Java Open JDK 8

  • Python 2

  • MariaDB:

    • Maria DB 10.0.38

    • Galera Cluster 3.20

  • LeanXcale:

    • LeanXcale 0.96-SNAPSHOT

    • KiVi-0.132

3. Data Model

The data model obviously will be the same for both databases because the process (YCSB Clients) that will run the benchmark is the same just modifying the connection URL.

The benchmark will be based on a table with 1 string field as primary key, a 10 more string fields filled with random characters.

Example 1. Table creation
CREATE TABLE usertable (YCSB_KEY varchar(100) PRIMARY KEY,

FIELD0 varchar(255),

FIELD1 varchar(255), FIELD2 varchar(255),

FIELD3 varchar(255), FIELD4 varchar(255),

FIELD5 varchar(255), FIELD6 varchar(255),

FIELD7 varchar(255), FIELD8 varchar(255),

FIELD9 varchar(255));

3.1. Isolation Level

LeanXcale DB supports two isolation levels: Read Committed (default) and Snapshot Isolation as to provide read consistent view of the database to all transactions. For this benchmark we are using the default Read Committed isolation level.

MariaDB will be set up with Read Committed as as well, which by default implements Repeatable Reads.

4. Workloads

The workloads used in the benchmark have been selected to give a fair representation of behavior for both LeanXcale and MariaDB, and are not weighted in any way towards or against either.

No auto-commit
The auto-commit function for both LeanXcale and MariaDB is disabled. This is because we want to be able to manage operations in batch to improve performance.

4.1. Scenarios

In the benchmark, we use 5 workload scenarios, 4 one-operation scenarios, and 1 mixed-operations scenario.

4.1.1. Single

  • Load: pure INSERT statements with batch size of 1000 rows:

    • LeanXcale: 400000 rows

    • MariaDB: 40000 rows

  • Read: pure SELECT statements based on the primary key, getting 50000 rows.

  • Update: pure UPDATE statements based on the primary key, updating 50000 rows.

  • Scan: pure SELECT statements from a random key till the end of the table, uniform distribution around 1000 rows

Magnitude
As you can see, we are using an order of magnitude fewer inserts in MariaDB than in LeanXcale. The reason for this is that with 400000 rows, MariaDB would take too long to finish for a reasonable test. And if we were to use 40000 rows in LeanXcale, it would finish so quickly to show any meaningful information.

4.1.2. Mixed

  • Read + Update: mixed SELECT with primary key, and UPDATE statements, 25000 rows for each one.

4.2. Clients

For good workload result, we vary the number of YCSB clients and number of threads per client:

  • Number of clients: 10, 23, 35, 50

  • Number of Threads: 10, 23, 35, 50

All possible combinations are used in order to get a reasonable distribution.

5. Metrics

In the benchmark we collect Throughput (operations per second) and Average Latency (µs) of the operations launched.

All metrics are measured from the YCSB client side for fairness.

6. Setup

6.1. MariaDB Setup

The MariaDB setup follows the standard installation process, and is set up with the previously described data model.

6.2. JDBC

The MariaDB JDBC driver (Java 8 connector) is named mariadb-java-client-2.4.1.jar. We’ll put that file in the ycsb-<version>/jdbc-binding/lib directory.

6.3. Isolation

In order to get fair results, the isolation level for MariaDB needs to match the LeanXcale, so MariaDB is configured to use transaction-isolation = READ-COMMITTED in the server.cnf file.

6.4. Memory

In order to get the best possible performance out of MariaDB, the memory usage distribution needs to be optimized for the the loads that we are going to execute, so we’ll set key_buffer_size = 2G and innodb_buffer_pool_size = 5G in the server.cnf file. The innodb_buffer_pool_size is specially useful for insert intensive loads.

6.5. Cluster

MariaDB Galera Cluster is a synchronous multi-master cluster for MariaDB with XtraDB/InnoDB storage engines. It is installed and managed separately from MariaDB.

Each MariaDB instance needs to have the galera.cnf file indicating the IP of the machine and all other machines in the cluster.

6.6. Start

To start the cluster, the master needs to be started first with galera_new_cluster on the master node, and then the slave nodes can be started.

6.7. YCSB clients

When the cluster is up and runnning, YCSB Clients can be started, using the MariaDB JDBC client as interface.

7. LeanXcale Setup

LeanXcale is aimed towards clusters out of the box, so there is no special installation of a clustering tool. All that is required is to update the configuration inventory file to contain the names of the machines on which you are going to install the database.

7.1. JDBC

The LeanXcale Java JDBC driver can be downloaded from the Drivers page, and is placed in the ycsb-<version>/jdbc-binding/lib directory.

7.2. Automatic clustering

When the driver is in place there is no need to change anything else. All the cluster configuration and load distribution is automated by LeanXcale.

7.3. YCSB clients

When the cluster is up and runnning, YCSB Clients can be started, using the MariaDB JDBC client as interface.

8. Results

Chart Legend
In the charts below, the Test column shows Number of Clients - Number of Threads. For example: 23/50 means 23 clients with 50 threads per client.

8.1. Throughput

If we look at the throughput data as graphs, the picture becomes rather clear. As the load increases, it becomes increasingly important how the resources in a cluster are used.

8.1.1. READ Performance

READ Throughput

throughput read

The READ operation shows how the performance of MariaDB remains static, with a slight dropoff as large client numbers create more overhead.

For LeanXcale, the case is clearly different: Performance continues to increase. As soon as we add clients, the performance goes up, and we can maintain stable perforamnce by increasing the number of threads.

8.1.2. UPDATE Performance

UPDATE Throughput

throughput update

With the UPDATE operation, the picture is similar to READ. MariaDB provides a reasonably stable performance, with a general slight increase in performance as clients and threads are added.

Meanwhile, LeanXcale peformance keeps rising as we add clients and threads.

8.1.3. READ_UPDATE Performance

READ_UPDATE Throughput

throughput read update

In the case of the READ_UPDATE mixed operation, unsurprisingly, we get a similar outcome as for READ with some UPDATE aspects. MariaDB performance is reasonably stable, but does not improve for increased loads.

LeanXCale performance increases as we add clients and maintain stability with more threads, and at very high loads we experience a slight dropoff.

8.1.4. LOAD Performance

LOAD Throughput

throughput load

The LOAD operation workload shows a different pattern from the other cases.

For both databases, increasing the number of threads per client causes a performance decrease. But there is an important difference.

MariaDB has a performance peak at 10 threads, and this peak remains constant over increased number of clients.

LeanXcale on the other hand, also reaches a peak at 10 threads, but this peak increases as we increase the number of clients.

In practice this means that while the MariaDB cluster reaches a bottleneck, the LeanXcale cluster continues to be able to accept more load.

8.1.5. SCAN Performance

SCAN Throughput

throughput scan

With a SCAN operation workload the performance gap between the databases is huge. This is an architectural problem.

MariaDB performance not only is low, it also is a flat line, because the load is not automatically distributed over the nodes. Even though we have manually distributed nodes over the machines, we still cannot increase performance.

LeanXcale on the other hand does all the balancing and load distribution work for itself, which allows us to keep the same familiar profile: as we add clients and threads, performance keeps going up.

8.2. Latency Performance

8.2.1. LOAD Latency

LOAD Latency

latency load

  • Load: Latency at this stage in both databases behaves in a very similar way, which says a lot about LeanXcale because it can reach great Throughput marks without increasing the latency.

8.2.2. READ Latency

READ Latency

latency read

8.2.3. UPDATE Latency

UPDATE Latency

latency update

8.2.4. READ_UPDATE Latency

READ_UPDATE Latency

latency read update

  • Read, Update, Read_Update: At this stage the MariaDB latency is increasing with the number of clients-threads, besides LeanXcale latency keep constant. This is due to the fact that MariaDB is overloaded and while is not able to increase the throughput, latency is growing more and more.

8.2.5. SCAN Latency

SCAN Latency

latency scan

  • Scan: The behaviour at previous scenarios now is shown much clear and multiplied by one magnitude order. MariaDB latency grows till points where is practically not usable, while LeanXcale keeps a constant controlled latency. This is related with the flow control mechanism and the scan-streaming processes implemented by LeanXcale.

9. The impact of architecture

As should be clear from the previous discussion, the differences between MariaDB and LeanXcale are an effect of how resources are managed and used.

9.1. Balancing and load distribution

9.1.1. MariaDB - master/slave architecture

A major drawback of the MariaDB architecture is that it uses a Master-Slave relationship. This means there is no method to automatically distribute load between the nodes. This has to be handled by the application.

In order to balance the load, you can manually spread the load by distributing the clients uniformly over the available machines. This does solve the balancing problem, but does not provide an increase in performance.

9.1.2. LeanXcale - automatic management

Since LeanXcale has the ability of add and remove nodes dynamically, load is automatically distributed. In addition to this, LeanXcale performance continues to increase as the number of clients is increased. Also, as threads are added to each client, the performance remains stable.

9.2. Cluster management

9.2.1. MariaDB - Galera clusters

A significant drawback is that a Galera Cluster configuration is static, so you need to stop and restart the database in order to re-size the cluster or make other changes.

There are additional complexities for starting and stopping Galera Clusters, since they are dependendent on the machine order. This may involve needing to modify configuration files if the cluster has gone to an inconsistent state.

9.2.2. LeanXcale

With LeanXcale, changing the cluster is a fully automatic and dynamic activity. There is no need to interrupt the system since nodes can be added on the fly.

9.3. Processing

9.3.1. MariaDB - master/slave

With the master/slave architecture, MariaDB, the mechanism becomes “do one write and send the information to other machines the cluster”. This mechanism generates a huge overhead.

The architecture also means that aggregated queries cannot be processed in parallel at cluster level.

10. KiVi - LeanXcale Direct Datastore API

By now, the benefits of using LeanXcale should be obvious. However, this is only the case of using a standard JDBC connection.

With KiVi, the LeanXcale Direct Datastore API, you can experience even greater performance increases as resources are added.

10.1. Throughput with KiVi

LOAD Throughput

throughput load

READ Throughput

throughput read

SCAN Throughput

throughput scan

UPDATE Throughput

throughput update

READ_UPDATE Throughput

throughput read update

11. Total Cost of Ownership

When calculating the total cost of ownership of your database solution it is easy to ignore the impact of performance and latency. Hoever, the actual running costs over time can be very severely impacted by how the database operates.

In this benchmark, the different costs for MariaDB and LeanXcale are in definite contrast:

Table 1. Infrastructure cost (USD/Bopm)
DB writes reads

MariaDB

8.086

10.78

LeanXcale

2.021

0.183

11.1. TCO conclusions

LeanXcale operates at 4 times smaller in writes and 58.9 times smaller in reads. The total cost is decidedly lower for the same or better service levels.

12. Time To Market

When developing your solution you want to move quickly from concept to market delivery. The time to market for any new solution is a very important, yet often overlooked aspect of the development decision process. Moving from development to scaling up to a finished product needs to be a smooth transition. Steep learning curves or slow deployment in the initial development can be as detrimental as complicated migrations or basic technology changes at a late stage, and both can have severely detrimental effects on the bottom line.

The ideal database provides a reasonable learning curve, with a versatile set of options and the possobility of deeper sopistication and specialization. You want to be able to do quickly, and then be able to scale up to production levels of complexity without too much of context switching for the developers.

12.1. TTM conclusions

LeanXcale lets you use the same database for prototyping, initial development, all the way to finished product, deployment and future upscaling. You can keep a single architecture, avoid migrations, and keep your team focused on a reduced set of tools. All of this while still keeping the needed flexibility and ability to change the solution according to market needs.

13. KiVi - Conclusions

When using KiVi, we skip transactions management and query language processing software. This means that we lose the isolation level, which is very useful for bulk load/read scenarios. The increase in performance is notable.

13.1. Thread peaks

While performance does grow continuously, there are local peaks (at 10 threads) as we increase the number of clients. This means that we are still far away from saturation point for the given machines.

13.2. Single-thread processing

The LeanXcale datastore uses high-performance single-thread processes in order to reduce the amount of context switch. The drawback is that as the number of threads increases, each thread is less effective. Still, the overall effect is increased performance.

14. The upsides of LeanXcale

LeanXcale provide major benefits with strong impact on important areas of interest:

a) Short time to market: With LeanXcale, you only need one database for all your needs. This means simpler architecture, fewer technologies to master while being able to easily adapt to changed requirements and workload needs.

b) Low total cost of ownership: We squeeze every single cycle of CPU, so you get better performance of the same machine, paying less at the end of the month for the same (better) service.

c) Extreme scalability: When you achieve success, LeanXcale will be ready to handle any workload, and you won’t need to change your architecture to fit even the most extreme performance demands.

14.1. Suitability

The great flexibility and versatility of LeanXcale makes it perfect for implementation in a startup, without sacrificing the needs of major enterprises. Your database solution can be created quickly and allows for expansion, modification and growth in pace with your other development.

There is no need to make separate decisions for your quick-and-dirty test implementation and your finalized enterprise product. LeanXcale will follow you all the way.