MariaDB vs LeanXcale
1. Introduction
LeanXcale is a versatile database that combines SQL functionality with NoSQL characteristics. This allows for a range of benefits in flexibility, scalability and maintenance.
In this article we compare performance for LeanXcale with that of MariaDB, using fair workload scenarios. We will show how you can experience marked improvements when choosing LeanXcale over MariaDB.
In addition, LeanXcale provides KiVi, a distributed storage engine which provides additional benefits for all workloads.
1.1. Benchmarking Tool
YCSB, the Yahoo Cloud Serving Benchmark, an open-source specification and program suite commonly used to compare performance of NoSQL database systems.
1.2. Method
The comparison is made by generating emulated workload with YCSB and letting the YCSB clients connect to the databases over JDBC.
2. Machine Specifications
In the benchmark comparison we are using 4 Amazon AWS instances, 1 for the YCSB clients generating the workload, and 3 for the databases.
2.1. Execution
YCSB clients use a lot of memory, so YCSB Client processes are running on a separate machine than the ones used for the databases cluster in order to do not disturb the metrics.
YCSB Execution plan
YCSB uses a 2 stage execution plan, with a load stage (clients only execute INSERTS statements in order to get a full working set) followed by a run stage (UPDATES, GETS and SCANS).
|
2.2. Machines
All machines used are hosted at Amazon AWS. We have requested spot instances and built the database clusters by hand. The machines that we are using are “r3.xlarge”, so they have the following specs:
*Instance Family * |
Instance Type |
Processor Arch |
vCPU |
Memory (GiB) |
Instance Storage (GB) |
EBS-optimized Available |
Network Performance |
Memory optimized |
r3.xlarge |
64 bits |
4 |
30,5 |
1 x 80 |
Yes |
Moderate |
3. Data Model
The data model obviously will be the same for both databases because the process (YCSB Clients) that will run the benchmark is the same just modifying the connection URL.
The benchmark will be based on a table with 1 string field as primary key, a 10 more string fields filled with random characters.
CREATE TABLE usertable (YCSB_KEY varchar(100) PRIMARY KEY,
FIELD0 varchar(255),
FIELD1 varchar(255), FIELD2 varchar(255),
FIELD3 varchar(255), FIELD4 varchar(255),
FIELD5 varchar(255), FIELD6 varchar(255),
FIELD7 varchar(255), FIELD8 varchar(255),
FIELD9 varchar(255));
3.1. Isolation Level
LeanXcale DB supports two isolation levels: Read Committed (default) and Snapshot Isolation as to provide read consistent view of the database to all transactions. For this benchmark we are using the default Read Committed isolation level.
MariaDB will be set up with Read Committed as as well, which by default implements Repeatable Reads.
4. Workloads
The workloads used in the benchmark have been selected to give a fair representation of behavior for both LeanXcale and MariaDB, and are not weighted in any way towards or against either.
No auto-commit
The auto-commit function for both LeanXcale and MariaDB is disabled. This is because we want to be able to manage operations in batch to improve performance.
|
4.1. Scenarios
In the benchmark, we use 5 workload scenarios, 4 one-operation scenarios, and 1 mixed-operations scenario.
4.1.1. Single
-
Load: pure
INSERT
statements with batch size of 1000 rows:-
LeanXcale:
400000 rows
-
MariaDB:
40000 rows
-
-
Read: pure
SELECT
statements based on the primary key, getting50000 rows
. -
Update: pure
UPDATE
statements based on the primary key, updating50000 rows
. -
Scan: pure
SELECT
statements from a random key till the end of the table, uniform distribution around1000 rows
Magnitude
As you can see, we are using an order of magnitude fewer inserts in MariaDB than in LeanXcale. The reason for this is that with 400000 rows, MariaDB would take too long to finish for a reasonable test. And if we were to use 40000 rows in LeanXcale, it would finish so quickly to show any meaningful information.
|
5. Metrics
In the benchmark we collect Throughput (operations per second) and Average Latency (µs) of the operations launched.
All metrics are measured from the YCSB client side for fairness.
6. Setup
6.1. MariaDB Setup
The MariaDB setup follows the standard installation process, and is set up with the previously described data model.
6.2. JDBC
The MariaDB JDBC driver (Java 8 connector) is named mariadb-java-client-2.4.1.jar
. We’ll put that file in the ycsb-<version>/jdbc-binding/lib
directory.
6.3. Isolation
In order to get fair results, the isolation level for MariaDB needs to match the LeanXcale, so MariaDB is configured to use transaction-isolation = READ-COMMITTED
in the server.cnf
file.
6.4. Memory
In order to get the best possible performance out of MariaDB, the memory usage distribution needs to be optimized for the the loads that we are going to execute, so we’ll set key_buffer_size = 2G
and innodb_buffer_pool_size = 5G
in the server.cnf
file. The innodb_buffer_pool_size
is specially useful for insert intensive loads.
6.5. Cluster
MariaDB Galera Cluster is a synchronous multi-master cluster for MariaDB with XtraDB/InnoDB storage engines. It is installed and managed separately from MariaDB.
Each MariaDB instance needs to have the galera.cnf
file indicating the IP of the machine and all other machines in the cluster.
7. LeanXcale Setup
LeanXcale is aimed towards clusters out of the box, so there is no special installation of a clustering tool. All that is required is to update the configuration inventory file to contain the names of the machines on which you are going to install the database.
7.1. JDBC
The LeanXcale Java JDBC driver can be downloaded from the Drivers page, and is placed in the ycsb-<version>/jdbc-binding/lib
directory.
8. Results
Chart Legend
In the charts below, the Test column shows Number of Clients - Number of Threads. For example: 23/50 means 23 clients with 50 threads per client.
|
8.1. Throughput
If we look at the throughput data as graphs, the picture becomes rather clear. As the load increases, it becomes increasingly important how the resources in a cluster are used.
8.1.1. READ Performance
The READ
operation shows how the performance of MariaDB remains static, with a slight dropoff as large client numbers create more overhead.
For LeanXcale, the case is clearly different: Performance continues to increase. As soon as we add clients, the performance goes up, and we can maintain stable perforamnce by increasing the number of threads.
8.1.2. UPDATE Performance
With the UPDATE
operation, the picture is similar to READ
. MariaDB provides a reasonably stable performance, with a general slight increase in performance as clients and threads are added.
Meanwhile, LeanXcale peformance keeps rising as we add clients and threads.
8.1.3. READ_UPDATE Performance
In the case of the READ_UPDATE
mixed operation, unsurprisingly, we get a similar outcome as for READ
with some UPDATE
aspects. MariaDB performance is reasonably stable, but does not improve for increased loads.
LeanXCale performance increases as we add clients and maintain stability with more threads, and at very high loads we experience a slight dropoff.
8.1.4. LOAD Performance
The LOAD
operation workload shows a different pattern from the other cases.
For both databases, increasing the number of threads per client causes a performance decrease. But there is an important difference.
MariaDB has a performance peak at 10 threads, and this peak remains constant over increased number of clients.
LeanXcale on the other hand, also reaches a peak at 10 threads, but this peak increases as we increase the number of clients.
In practice this means that while the MariaDB cluster reaches a bottleneck, the LeanXcale cluster continues to be able to accept more load.
8.1.5. SCAN Performance
With a SCAN
operation workload the performance gap between the databases is huge. This is an architectural problem.
MariaDB performance not only is low, it also is a flat line, because the load is not automatically distributed over the nodes. Even though we have manually distributed nodes over the machines, we still cannot increase performance.
LeanXcale on the other hand does all the balancing and load distribution work for itself, which allows us to keep the same familiar profile: as we add clients and threads, performance keeps going up.
8.2. Latency Performance
8.2.1. LOAD Latency
-
Load: Latency at this stage in both databases behaves in a very similar way, which says a lot about LeanXcale because it can reach great Throughput marks without increasing the latency.
8.2.4. READ_UPDATE Latency
-
Read, Update, Read_Update: At this stage the MariaDB latency is increasing with the number of clients-threads, besides LeanXcale latency keep constant. This is due to the fact that MariaDB is overloaded and while is not able to increase the throughput, latency is growing more and more.
8.2.5. SCAN Latency
-
Scan: The behaviour at previous scenarios now is shown much clear and multiplied by one magnitude order. MariaDB latency grows till points where is practically not usable, while LeanXcale keeps a constant controlled latency. This is related with the flow control mechanism and the scan-streaming processes implemented by LeanXcale.
9. The impact of architecture
As should be clear from the previous discussion, the differences between MariaDB and LeanXcale are an effect of how resources are managed and used.
9.1. Balancing and load distribution
9.1.1. MariaDB - master/slave architecture
A major drawback of the MariaDB architecture is that it uses a Master-Slave relationship. This means there is no method to automatically distribute load between the nodes. This has to be handled by the application.
In order to balance the load, you can manually spread the load by distributing the clients uniformly over the available machines. This does solve the balancing problem, but does not provide an increase in performance.
9.1.2. LeanXcale - automatic management
Since LeanXcale has the ability of add and remove nodes dynamically, load is automatically distributed. In addition to this, LeanXcale performance continues to increase as the number of clients is increased. Also, as threads are added to each client, the performance remains stable.
9.2. Cluster management
9.2.1. MariaDB - Galera clusters
A significant drawback is that a Galera Cluster configuration is static, so you need to stop and restart the database in order to re-size the cluster or make other changes.
There are additional complexities for starting and stopping Galera Clusters, since they are dependendent on the machine order. This may involve needing to modify configuration files if the cluster has gone to an inconsistent state.
9.3. Processing
9.3.1. MariaDB - master/slave
With the master/slave architecture, MariaDB, the mechanism becomes “do one write and send the information to other machines the cluster”. This mechanism generates a huge overhead.
The architecture also means that aggregated queries cannot be processed in parallel at cluster level.
10. KiVi - LeanXcale Direct Datastore API
By now, the benefits of using LeanXcale should be obvious. However, this is only the case of using a standard JDBC connection.
With KiVi, the LeanXcale Direct Datastore API, you can experience even greater performance increases as resources are added.
11. Total Cost of Ownership
When calculating the total cost of ownership of your database solution it is easy to ignore the impact of performance and latency. Hoever, the actual running costs over time can be very severely impacted by how the database operates.
In this benchmark, the different costs for MariaDB and LeanXcale are in definite contrast:
DB | writes | reads |
---|---|---|
MariaDB |
8.086 |
10.78 |
LeanXcale |
2.021 |
0.183 |
12. Time To Market
When developing your solution you want to move quickly from concept to market delivery. The time to market for any new solution is a very important, yet often overlooked aspect of the development decision process. Moving from development to scaling up to a finished product needs to be a smooth transition. Steep learning curves or slow deployment in the initial development can be as detrimental as complicated migrations or basic technology changes at a late stage, and both can have severely detrimental effects on the bottom line.
The ideal database provides a reasonable learning curve, with a versatile set of options and the possobility of deeper sopistication and specialization. You want to be able to do quickly, and then be able to scale up to production levels of complexity without too much of context switching for the developers.
12.1. TTM conclusions
LeanXcale lets you use the same database for prototyping, initial development, all the way to finished product, deployment and future upscaling. You can keep a single architecture, avoid migrations, and keep your team focused on a reduced set of tools. All of this while still keeping the needed flexibility and ability to change the solution according to market needs.
13. KiVi - Conclusions
When using KiVi, we skip transactions management and query language processing software. This means that we lose the isolation level, which is very useful for bulk load/read scenarios. The increase in performance is notable.
14. The upsides of LeanXcale
LeanXcale provide major benefits with strong impact on important areas of interest:
a) Short time to market: With LeanXcale, you only need one database for all your needs. This means simpler architecture, fewer technologies to master while being able to easily adapt to changed requirements and workload needs.
b) Low total cost of ownership: We squeeze every single cycle of CPU, so you get better performance of the same machine, paying less at the end of the month for the same (better) service.
c) Extreme scalability: When you achieve success, LeanXcale will be ready to handle any workload, and you won’t need to change your architecture to fit even the most extreme performance demands.
14.1. Suitability
The great flexibility and versatility of LeanXcale makes it perfect for implementation in a startup, without sacrificing the needs of major enterprises. Your database solution can be created quickly and allows for expansion, modification and growth in pace with your other development.
There is no need to make separate decisions for your quick-and-dirty test implementation and your finalized enterprise product. LeanXcale will follow you all the way.