Installation Guide

1. Introduction

LeanXcale is an ultrascalable distributed database. It is built of several components that can scale out to cope with your workload. To get into the details and try to take the most of the deployment, you should read the Architecture and Concepts documents.

You can install LeanXcale in Ubuntu, CentOS, or Red Hat (RHEL). We’ve tested this Install Guide with Ubuntu 20.04 LTS, Ubuntu 18.04 LTS and Red Hat Enterprise Linux 8.

If you need to install LeanXcale on any other modern Linux distribution, please contact us on our LeanXcale support team email.

2. Installing LeanXcale

2.1. Prerequisites

In order for the installation to be able to execute, all the machines in the cluster must meet some prerequisites. These are summarized as follows:

  • You need the following (very common) components to be installed:

    • Python 3 (>= 3.6) including pip3. pip3 will later be used by the deployment scripts to install the python dependencies for some tools.

    • Ansible >= 2.9

    • JAVA release 11

    • bash

    • netcat

    • libpthread

    • lsof

In multi-node setups, the Master node of LeanXcale needs to be able to communicate with the rest of nodes by SSH. There are two ways to do it, which are described in the Appendix A. If you’re using a single-node setup, you can skip this step.

2.1.1. Base directory

You need a base folder for the installation. This directory has to be the same in all machines If you are setting up a cluster.

The database manager (the OS user that operates the database: deploys, starts, stops, …​) must have rwx permissions in this folder.

user@host:~$ mkdir lxs

2.1.2. Filesystems

This installation procedures also assume that all filesystems are already created and mounted so there is space available for the installation and the data for the database.

By default, unless you configure the inventory to use an alternative folder, the data for the database is stored in the sub-folder LX-DATA from the base directory. You can mount there the filesystem to hold your data.

If you want to have hot-backup capabilities you must use ZFS because the hot-backup functionality relies on ZFS snapshots. ZFS is also be recommended for compression, but has a cost in terms of memory resources that has to be allocated for ZFS and may have some performance penalty in high insert workloads.

If hot-backup is not a requirement, any other filesystem can work (ext4, …​)

2.1.3. Installing dependencies in Ubuntu 20.04

// You have to run this commands in all the nodes of the cluster

sudo apt update
sudo apt install -y ansible python3-psutil python3-pip openjdk-11-jdk lsof libffi-dev rustc netcat sysstat

2.1.4. Installing dependencies in Ubuntu 18.04

// You have to run this commands in all the nodes of the cluster

sudo apt update
sudo apt install -y ansible python-psutil python3-pip openjdk-11-jdk lsof libffi-dev netcat rustc sysstat
pip3 install --upgrade pip

2.1.5. Installing dependencies in Red Hat Enterprise Linux 8

// You have to run this commands in all the nodes of the cluster

sudo yum -y install python3 python3-devel gcc nmap-ncat java-11-openjdk lsof libffi-dev netcat rustc sysstat
pip3 install ansible --user

2.1.6. Check Java configuration

java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)

If you installed JAVA 11, but have another version of JAVA set as default, you may need to set up JAVA_HOME so JAVA 11 is used.

2.2. Unpacking

The first step for installation is choosing a machine from the cluster to be the Master (the orchestrator of the cluster). On the chosen machine, change to the directory you just created and decompress the LeanXcale binaries there:

user@host:~$ cd lxs
user@host:~/lxs$ tar xvf ../LeanXcale_{version-x.y.z}_latest.tgz
# Yes, this has to be done manually before installing the requirements to skip problems
user@host:~/lxs$ python3 -m pip install cryptography
user@host:~/lxs$ python3 -m pip install -r LX-BIN/scripts/requirements.txt

Before running any admin command, we advice to execute the script that you’ll find on the root of the installation folder, because there we set some useful variables and aliases that you’ll find very useful when you manage your LeanXcale instance:

# Change to the directory where you installed LeanXcale first
user@host:~ cd lsx
user@host:~/lxs$ source ./

We recommend that you put this line in your .bashrc so it gets executed each time you login or boot your machine:

source ~/lxs/ ~/lxs

Now the master server of the cluster is ready to be configured so you can later deploy the installation to the rest of the servers.

2.3. Simple Configuration: the inventory file

For configuring the cluster, the basic information you need to know is just the hostname of the machines you want to deploy to.

With that information, you can just configure the servers of the cluster in the inventory file:

user@host:~/lxs$ vim conf/inventory

Following you can see an example of a simple configuration file:

BASEDIR="{{ lookup('env','BASEDIR') }}"

# Login account for the cluster
USER="{{ lookup('env','USER') }}"

# Cluster name

#path to backup directory

# Resource sizing
SIZE={"MEM": '128G', NCORES": 12}

# Data filesystems
FILESYSTEM={"KVDS": ["/tmp/KVDS-1", "/tmp/KVDS-2"], "LgLTM": "/tmp/LOGS"}

# Metadata servers
[meta] ansible_connection=local

#Datastore Servers. You can have multiple data store instances
[datastores] ansible_connection=ssh ansible_connection=ssh

COMPONENTS=['ZK', 'MtM', 'LgCmS', 'KVMS', 'LXIS', 'CflM']


2.4. Completing the Installation

Once you have finished changing the configuration file, you can complete the installation. This last step will build your detailed configuration file including the resource allocation, and deploy the installation to the rest of the nodes of the cluster.

This step is needed even if you have a single machine deployment, because this builds some internal configuration files that are needed to run and installs python dependencies through pip.

user@host:~/lxs$ admin/

After a while, in the logs of the process printed on screen you’ll see the result. You should get failed=0 as the result for every host:

PLAY RECAP **********************************************************************************************************************************************************************************************************************************   : ok=15   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   : ok=15   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  : ok=18   changed=7    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

If you had any problem with this, check pip and whether the user can install python packages.

From this point the cluster is installed and set up to be started. You can now go to the Operations manual and read how to run the database cluster and check everything is running OK.

2.5. Default Resource Allocation

The default configuration is based on some workload assumptions that may not be met by your application, so resources may not be making the most of your resources with this configuration.

The default is to use all memory, all sockets, all cores and use the same filesystems as used for BASEDIR, but you can change those easily to fit your needs (mainly of you don’t want to allocate all resources to LeanXcale).

The line starting with SIZE defines the default resource allocation. Section [all:vars] assumes that all machines in the cluster have the same HW configuration. However, those parameters could be defined for each machine/group if you need to override the default sizing. The line has the following parameters:

  • mem: Memory available in the machine to run the components (in GB). If not set it defaults to take all the memory in each machine and distribute it for LeanXcale.

  • sockets: Lists of sockets in the machine to be used to run the components. Values like 0-2 or 0,1,2 or 0,4 are allowed. In DataStore machines, there should be two socket lists definition: The socket list for KVDS and the socket list for QE. Those list must be separated by colon ':'

    Again, if not set, all sockets in the machine will be used.

  • ncores: This is the number of physical cores to be used in each machine to run LeanXcale components. Take care that this is physical cores so hyperthreads should not be counted on.

  • nthreadscore: Number of hyperthreads per core

The line in the above example means: The machines have 32GB available and 2 sockets (0 to 1), there are 8 physical cpus in total and there is no hyperthreading.

The line starting with FILESYSTEM defines the filesystems to save data from following components: KVDS, LgLTM, ZK, LgCmS and KVMS. Again, those params may be overriden in a per group/per machine basis. A list of paths may be used to distribute data from several instances of the same type of component. The default filesystems, if not specified, is BASEDIR/LX-DATA

It is highly recommended separating transaction redo logging into different disks from database data.

The line in the example means: redo logging will be written to the filesystem mounted at /tmp/LOGS and data from all KVDS in a machine will be distributed to /tmp/KVDS-1 and /tmp/KVDS-2.

2.6. Default Component Deployment

Components are typically classified into classes:

  • Metadata components:

    • Zookeeper

    • Configuration Manager

    • Snapshot Server

    • Commit Sequencer

    • KiVi Metadata Server

  • Multi-instance components:

    • Query Engine

    • Transactional Loggers

    • KiVi Datastore Server

    • Conflict Managers

  • Monitors and others: Non core components with auxiliary functionality

Giving this classification, the metadata components are deployed in the metadata server while rest of components will be distributed considering the number of datastore servers and their HW capacity.

Components are listed in inventory with mnemonics and group aliases:

  • MIN: ['ZK', 'MtM', 'LgCmS', 'KVMS', 'LXIS', 'KVDS']

  • REC: ['ZK', 'MtM', 'LgCmS', 'KVMS', 'LXIS', 'KVDS', 'CflM', 'LgLTM', 'QE']

  • ALL: ['ZK', 'MtM', 'LgCmS', 'KVMS', 'LXIS', 'KVDS', 'CflM', 'LgLTM', 'QE', 'KVP', 'HAP', 'SplitMon', 'MemCtl', 'MonS', 'MonC', 'Superset', 'IM']

2.7. Default Component Start/Stop

Some components have a special behaviour on start/stop cluster, they are excluded from this operation unless explicitly stated.

  • Excluded from default start: Superset

  • Excluded from default stop: Superset, MonS, MonC, LDAP, IM

3. Advanced Inventory configurations

Inventory section [all:vars] has many others variables and settings to fine-tune our cluster or deploy powerful features as partiotioning or high availability.

If vars are not present or valued to 'no', then default config remains.

3.1. Forcing components config

We can tune number and RAM memory for every component type.

FORCE={"QE": {"NUM": 1, "MEM": 8}}

localhost FORCE='{"KVDS": {"MEM": 4}}'  # for host vars, dicts should be enclosed in single quotes

FORCE={"KVMS": {"MEM": 2, "NUM": 1}, "QE": {"NUM": 2}}  # QE dict is overwritten

3.2. Bidimensional partitioning

We can define partitioning settings in a dictionary

# _DEFAULT configure 30% memory target for partitions and 360d for keeping data
# Next entry configure Table1 to have 24h partitions and keeping 7 days of data retention period
BIDIPT = {"_DEFAULT": "30%:360d", "db-APP-Table1": "24h:7d"}

3.3. Other settings

Following are other settings in the inventory:

  • SUDO. If user has SUDO permissions add SUDO=sudo. Otherwise you may need to create some folders and grant access to the user for some actions to work

  • NUMA. If you want to use NUMA say yes

  • NCLIENTSQE. Number of clients per Query Engine

  • MASTERALERTHTTP. URL for alert manager

  • MONITORING_CLIENTS. URL for monitoring components

  • EXTERNAL_ZK. List of external Zookeeper hosts

  • SECURITY. Security configuration for KiVi

  • EXTERNAL_KVP. Address for external KVProxy

  • EXTERNAL_LDAP. Address for external LDAP

  • EXTERNAL_HAPROXY. List of external HAProxy hosts

  • HAPROXY_BIN. Set path to binary executable.

4. High Availability

LeanXcale can be set-up as a high availability cluster. High availability means that in case of a single component failure or single machine failure the system will keep working non-stop.

You can further define the configuration parameters so the system can keep working in case of two-machine failure or even stronger high availability situations.

4.1. Configuring High Availability

Configuring High Availability is pretty simple. It just requires that you set the following configuration parameter in the inventory file:


4.2. Implications

The implications of configuring this parameter are:

  • The install script will check that there are at least three machines configured for the cluster.

There has to be at least 2 machines defined as metadata servers and 2 machines defined as datastore servers. One machine can act both as metadata and datastore and that’s the reason there has to be at least 3 machines in the cluster.

The following configuration could be a valid HA configuration:

BASEDIR="{{ lookup('env','BASEDIR') }}"
USER="{{ lookup('env','USER') }}"



  • Components will be installed as follows:

    • Zookeeper: Zookeeper is the master for keeping the global configuration, health information and arbitration. The install SCRIPT will configure a Zookeeper cluster with three (always odd number) Zookeeper members replicated in different machines.

    • Transaction Loggers: The loggers are components that log all transaction updates (called writesets) to persistent storage to guarantee durability of transactions. The loggers will be configured and replicated in groups so there is a different logger component on a different machine. Therefore the logging information will be replicated in order to recover information in case of a crash.

    • Conflict Manager: Conflict Managers are in charge of detecting write-write conflicts among concurrent transactions. They are high available "per se", because If one Conflict Manager fails the conflict key buckets are transferred to another Conflict Manager. No special configuration is needed for High Availability except guaranteeing that there are at least 2 Conflict Managers configured in 2 different machines.

    • Commit Sequencer: The commit sequencer is in charge of distributing commit timestamps to the Local Transactional Managers. Two Commit Sequencers will be configured, but only one will be acting as master and the other as follower. The follower will take over in case the master fails.

    • Snapshot Server: The Snapshot Sever provides the most fresh coherent snapshot on which new transactions can be started. As in the case of the Commit Sequencer, 2 snapshot servers will be started one will be acting as master and the other as follower. The follower will take over in case the master fails.

    • Configuration Manager: The configuration manager handles system configuration and deployment information. It also monitors the other components. Two configuration managers will be started and one will be the master configuration manager.

    • Query Engine: The Query Engine parses SQL queries and transforms them into a query plan which will derive in a set of actions to the datastores. The query engines are usually configured in a cluster in which any of them can take part of the load so High availability has no special requirement on them.

    • Datastores: There will usually be several datastores in different machines. The important point in datastores is replication.

4.3. Replication

So far, High availability configuration ensures the components are configured in a way that no component failure or machine crash will cause the system to fail.

While components can be available, data replication is about data availability so, if a machine or a disk crashes, there is another copy of the data so you can keep working on that copy of the data. Data replication can be enabled regardless whether you want also to have high availability or not.

LeanXcale provides a full synchronous replication solution and you can configure as many replicas as you want. Basically, you can define a mirroring datastore configuration in the inventory file:

# MIRRORS = {"m1": ["metadataserver1", "dataserver3"]}

MIRRORS var set the number of servers in a mirror group. If set to 'no' or not present, there is no mirroring. If set to yes, defaults to 2 servers for every mirror group. All servers with datastores are mirrored.

When not all datastores are to be mirrored, that var may be either a list of inventory groups of servers or a dictionary with a list of hosts for every mirror group.

Besides, there are 2 kinds of replicas:

  • Replicas to minimize the risk of losing data and being able to recover service as soon as possible.

  • Replicas to improve performance. This are usually small tables that are used really frequently (usually read) so the application can benefit from having several copies that can be read in parallel.

5. Authentication, SSL & Permissions

There are two options to configure LeanXcale authentication

  • LDAP based authentication. You can set up an LDAP server just for LeanXcale, but this is most interesting for integrating LeanXcale in your organization’s LDAP (or Active Directory) and provide some kind of Single Sign On or at least the chance to use a common password within the organization.

  • Open shared access. This is not to be used in production except for shared data. Access level can be set in the firewall based on IP rules, but all users accessing will be granted access and they will be able to use a user schema. This is very easy to set-up for development and testing environments.

Permissions are granted through roles.

Communications can be configured so SSL is used for connections between components though It is recommended to run all components behind a firewall and use JDBC over SSL and the clients' connections over SSL.

5.1. TLS for HAProxy

Securing communication between clients and LeanXcale server is enabled activating TLS options during deployment of cluster server.

If server deployment includes HAProxy then clients-HAProxy links may be secured, but intra-cluster HAProxy-QE links will not. In contrast, direct clients-QE links may be secured in absence of HAProxy. That is, securing QE and securing HAP are mutually exclusive.

To init the HAProxy with TLS activated (HTTPS) you need to:

  • Add this line in the [all:vars] section of the inventory.

    SECURE_COMUNICATION_HAP = {"CERTIFICATE": "/path/to/lxs-basedir/conf/server.pem"}

    Certificate file server.pem to be always in $BASEDIR/conf.
    This options is not compatible with SECURE_COMUNICATION_QE.

  • Defined that value, the Ansible-Installer adds security option to haproxy.cfg and HAProxy will start with TLS activated.

To create server.pem you may need to run these steps:

# Generate new private/public key pair if a trusted pair from a trusted CA is not available
# Option -nodes enable HAProxy run up without prompting for a password
openssl req -x509 -newkey rsa:2048 -sha256 -days 3650 -nodes \
  -keyout private.key -out client.crt \
  -subj "/ Dep./O=LeanXcale/L=Madrid/ST=Madrid/C=ES" \
  -addext ",DNS:localhost,IP:"
# Bundle Private_Key/Public_certificate into server.pem
cat private.key client.crt > server.pem

The subjectAltName option is a comma-list of server names, IPs or domains as accessed by clients. A typical production sample would be -addext ",IP:".

To be sure that TLS has been configured and HTTPS is activated for the HAP communication:

curl -X OPTIONS --insecure https://localhost:1522

Note port 1522 for connecting to database.

5.2. TLS for Query Engine

To init the QE with TLS activated (HTTPS) you need to:

  • Add this line in the [all:vars] section of the inventory. The password can be obfuscate to avoid having clear passwords in the inventory.

    SECURE_COMUNICATION_QE = {"KEYSTORE": "/path/to/lxs-basedir/conf/keystore.jks", "TRUSTSTORE": "/path/to/lxs-basedir/truststore.ts", "KEYSTORE_PASSWORD": "OBF:1v2j1uum1xtv1zej1zer1xtn1uvk1v1v", "TRUSTSTORE_PASSWORD": "OBF:1v2j1uum1xtv1zej1zer1xtn1uvk1v1v"}

    Store files Keystore.jks/truststore.ts to be always in $BASEDIR/conf.
    This options is not compatible with SECURE_COMUNICATION_HAP.

  • If all these values are defined, the Ansible-Installer adds the QE properties to the leanxcale-site.xml and starts the HTTPServer with TLS activated.

To create the keystore and the truststore you need to run the steps of HAProxy section to generate server.pem. Due to Jetty limitations, -addext option must not be run and hostname is specified in -subj "/CN=<hostname>/…​"


# Export private/public pair server.pem into PKCS12 store with PASSWORD
openssl pkcs12 -export -name lxqeserver -in server.pem -out keystore.p12 -password pass:PASSWORD
# Import PKCS12 into java keystore
keytool -importkeystore -srcalias lxqeserver \
  -srckeystore keystore.p12 -srcstorepass PASSWORD \
  -destkeystore keystore.jks -deststorepass PASSWORD
# Import the client certificate in the truststore
keytool -importcert -v -noprompt -trustcacerts -alias lxqeserver \
  -file client.crt -keystore truststore.ts -deststorepass PASSWORD
# with this command you can get the obfuscated password
java -cp $BASEDIR/LX-BIN/lib/jetty-util-9.4.28.v20200408.jar PASSWORD

To be sure that TLS has been configured and HTTPS is activated for the QE communication:

curl -X OPTIONS --insecure https://localhost:1529

Note port 1529 for connecting to database.

5.3. TLS for clients

The client side (lxClient, Squirrel, Python…​) will need the deployment of the TLS certificate if it is auto-generated, not from trusted Certificate Authority. Usually, this is done adding it to the OS root CA certificates or to a Java truststore.

  • Add certificate to root CA certificates in Ubuntu OS:

    sudo mkdir /usr/local/share/ca-certificates/extra
    sudo cp client.crt /usr/local/share/ca-certificates/extra/client.crt
    sudo update-ca-certificates
  • Add certificate to root CA certificates in Windows OS:

    Manage the certificate with app Microsoft Management Console. For an easy installation, double clicki in the .crt file and then select the Trusted Root Certification Authorities Store as the store for installing the certificate. It will show a warning about trusting the CA.

  • Add certificate to root CA certificates in Mac OS:

    Manage the certificate with app Keychain Access.

  • Alternatively, Java clients may import client.crt to desired truststore in client machine

    # import the certificate in truststore
    keytool -importcert -v -trustcacerts -alias lxqeserver -file client.crt -keystore truststore.ts

    To access to a truststore other than JVM default, exec Java with option Option enable java clients to work without certificate deployment but no server ID verification is done.

6. Ports

Ports for every component can be configured, by default the ports used are the following:

Component Default Port



Configuration Manager


Configuration Admin


Commit Sequencer


Snapshot Server


KiVi Metadata Server


KiVi Datastores


KiVi Proxy


Commit Sequencer Logger


Transactional Loggers


Conflict Manager


Query Engine


HAProxy (To balance across Query Engines)


LeanXcale Information Server


Grafana Port for Monitoring Console


All nodes in the cluster must have visibility over all these ports in the different machines.

If you are connecting from outside and want to set security rules in the firewall, you will need the following:

  • For SQL connection through ODBC, JDBC or SQLAlchemy you will need the HAPROXY port open: 1522. Just having this port will do.

  • For clients using the direct KiVi API you will need visibility of ports 9800 (KVPROXY), 14400 (KVMS), 9876 (LX-Information-Server)

7. Appendix A: setting up connections between nodes of a cluster

The Master node of LeanXcale needs to be able to communicate with the rest of nodes. This can be done in two ways: using SSH keys at the operating system level or using them within the LeanXcale inventory.

7.1. Using SSH Key Based Access at the OS

This is a standard OS mechanism in which you configure SSH keys among the hosts in the cluster so once the user has access to host-k he is trusted to access any other host in the cluster with the same user.

The right SSH keys should be set for the user starting the cluster, so it can get access to any machine in the cluster without any authentication mechanism. This way, LeanXcale’s administration scripts can deploy, configure, start and stop being run from one host and take action on all the hosts without having to ask for passwords for each action.

7.2. Using key files in the LeanXcale inventory

This method is similar, but the key can be set up to be used only by ansible.

The steps for setting up this are:

  • Generate a key pair

    user@host:~/lxs$ ssh-keygen -t rsa -b 4096 -f lx.pem

    You have to leave the passphrase empty. This will generate 2 files (lx.pem and

  • Set all the hosts to autorize access with the key. For every host in the cluster do:

    user@host:~/lxs$ ssh-copy-id -i user@anotherhost
  • Then configure the inventory file to use this key file. For that, go to BASEDIR/conf/inventory and add the following line:


7.3. Checking SSH Connectivity

You can check whether ansible is able to connect to the nodes you configured in the inventory by running:

user@host:~/lxs$ ansible -i $BASEDIR/conf/inventory -m ping datastores

If the connectivity is OK, you should see a message like this from all the nodes: | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    "changed": false,
    "ping": "pong"

8. Appendix B. Troubleshooting

This section covers some common errors that may happen while trying to install LeanXcale.

8.1. admin/ fails

  • Check connectivity as explained in the previous section and solve connectivity issues if there were any.

  • If there were no connectivity issues, also installs the python dependencies that the operation scripts need. It does so by using pip3 and, sometimes, pip3 is installed in a way that fail when run by ansible.

    Connect to the hosts failing and do:

    user@host:~/lxs$ sudo su -
    root@host:/ pip3 install psutil ansible docker pyjolokia
    root@host:/ pip3 install prometheus_client sortedcontainers bloom_filter

After taking these actions, run again admin/

8.2. KVDS and KVMS not starting

KiVi has very few dependencies. The important one is LIBC. You may have a version of KiVi that has been compiled with a version of LIBC newer than the one in your OS.

To check it just run:

user@host:~/lxs$ ldd LX-BIN/exe/kvds (0x00007ffd22dda000) => /lib/x86_64-linux-gnu/ (0x00007f995cf79000) => /lib/x86_64-linux-gnu/ (0x00007f995cf6e000) => /lib/x86_64-linux-gnu/ (0x00007f995cf4b000) => /lib/x86_64-linux-gnu/ (0x00007f995cd59000)
/lib64/ (0x00007f995d0d5000)

Above everything is correct, but the command may respond addressing issues in that dependency. In that case, there are only two options:

  • Install a newer LIBC

  • Ask LeanXcale Support for another installation package for your LIBC version