LeanXcale v2.4 User’s Guide

1. Background

1.1. LeanXcale Components

Before installing, it is important to know that LeanXcale has a distributed architecture and it consists of several components:

  • lxqe: Query engine in charge of processing SQL queries.

  • kvds: Data server of the storage subsystem. There might be multiple instances.

  • kvms: Metadata server of the storage subsystem.

  • lxmeta: Metadata process for LeanXcale. It keeps metadata and services needed for other components.

  • stats: Optional monitoring subsystem to see resource usage and performance KPIs of LeanXcale database.

  • odata: Optional OpenDATA server to support a SQL REST API.

There are other components used by the system, that are not relevant for the user and are not described here. For example, spread is a communication bus used by LeanXcale components.

1.2. LeanXcale Commands

The command lx is a Shell for running LeanXcale control programs. This simply fixes the environment for the installed host and runs the command given as an argument:

usage: lx [-d] cmd...

The command operates on the whole LeanXcale system, even when multiple hosts are used.

It is convenient to have lx in the PATH environment variable, as suggested in the install program output.

Command output usually includes information on a per-host basis reporting the progress of the used command.

Most commands follow the same conventions regarding options and arguments. We describe them here as a convenience.

Arguments specify what to operate (e.g., what to start, stop, etc.) may be empty to rely on the defaults (whole DB) or may specify a particular host and/or component name:

  • when only component names are given, and only those components will be involved (e.g., lxqe101).

  • when a component type name is given, components for that type are selected. (e.g., lxqe).

  • when a host name is given, any component following is narrowed to that host. If no components follow the host name, all components from the host are selected.

This may be repeated to specify different hosts and/or components.

The special host names db, repl, and repl2 may be used and stand for hosts without the nodb attribute, hosts for the first replica, and hosts for the second replica (hosts that are a mirror of other ones).

1.2.1. LeanXcale Commands on Bare Metal Installs

For bare metal installs, it suffices to have the lx command in the PATH. It can run on any of the installed hosts.

For example, on an installed host, lx version prints the installed version:

unix$ lx version
leanXcale v2.2
    kv         v2.2.2023-09-29.115f5fba70e3af8dc203953399088902c4534389
    QE         v2.2.2023-09-30.1e5933900582.26a7a5c3420cd3d5d589d1fa6cc
    libs       v2.2.2023-09-29.67535752acf19e092a6eaf17b11ad17597897956
    avatica    v2.2.2023-09-27.0b0a786b36e8bc7381fb2bb01bc8b3ed56f49172
    TM         v2.2.2023-09-29.9a9b22cfdc9b924dbc3430e613cddab4ed667a57

1.2.2. LeanXcale Commands on Docker Installs

To use the lx command on a docker install, an installed container must be running, and the command must be called on it.

For example, assume that the container named lx1 is running on a Docker install. The container could be started using the following command, assuming the leanXcale image is named lx:2, and the docker network used is lxnet:

unix$ docker run -dit --name lx1 --network lxnet -p0.0.0.0:14420:14420 lx:2 lx1
b28d30702b80028f8280ed6c55297b2e203540387d3b4cfbd52bc78229593e27

It is possible to attach to the container and use the ''lx'' command as it can be done on a bare metal host install:

unix$ docker attach lx1
lx1$ lx version
...

Here, we type docker attach lx1 on the host, and lx version on the docker container prompt.

Note that if you terminate the shell reached when attaching the docker container, it will stop. Usually, this is not desired.

It is possible to execute commands directly on the executed container. For example:

unix$ docker exec -it lx1 lx version

executes lx version on the lx1 container.

In what follows, lx1 is used as the container name in the examples for docker installs.

1.2.3. LeanXcale Commands on AWS Installs

Using lx on AWS hosts is similar to using it on a bare-metal install. The difference is that you must connect on the AWS instance to run the command there.

For example, after installing xample1.aws.leanxcale.com, and provided the PEM file can be found at xample.pem, we can run this:

unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx version

to see the installed version.

In what follows, xample.pem is used as the PEM file name and xample1.aws.leanxcale.com is used as the installed instance name, for all AWS install examples.

2. Start & Stop Particularities on Different Installs

System start depends on how the system has been installed. For bare-metal installations, the administrator installing the system is responsible for adding a system service that brings LeanXcale into operation when the machine starts, and stops LeanXcale before halting the system.

For AWS installations, LeanXcale is added as a service, disabled by default. Do not use the service on multi-host installs, the service starts/stops the DB and that requires DB processes to be accessible. That might not be the case with multiple instances.

When the service is enabled, starting the instance starts the LeanXcale service, and stopping the instance stops LeanXcale before the instance stops.

Otherwise, starting leanXcale requires to dial into the installed instance (one of them) and issue the lx start command.

For Docker installations, starting a container starts the LeanXcale service on it, and, for safety, LeanXcale should be halted before halting the container (otherwise Docker might decide to time-out and stop the container before LeanXcale did fully stop).

3. Licenses

To check for license status or to install a new license you can use the lx license command.

For a local installation, use just

unix$ lx license
	license expires: Mon Dec 30 00:00:00 2024

For docker installs, each container must include its own license. The DB in the container does not start the DB unless a valid license is found. But, the container must be running to check the license status and to install new licenses. Refer to the section on starting docker containers for help on that.

For example, to list the license status for the container lx1 we can run

unix$ docker exec -it lx1 lx license
lx1 [
	kvcon[1380]: license: no license file
  failed: failed: status 1
]
failed: status 1

To install a new license, just copy the license file to the container as shown here

unix$ docker cp ~/.lxlicense lx1:/usr/local/leanxcale/.lxlicense
unix$ docker exec -it lx1 sudo chown lx /usr/local/leanxcale/.lxlicense

The license status should be ok now:

unix$ docker exec -it lx1 lx license
	license expires: Mon Dec 30 00:00:00 2024

4. Licenses at LeanXcale

You can run the kvlicense file to generate a license file:

unix$ kvlicense
usage: kvlicense yymmdd file
unix$ kvlicense 243012 lxlicense
kvlicense[34951]: license saved at lxlicense with limit Fri Jun 12 2026

This binary is not built by default and is not included in the distribution Ask for help if you need it.

5. Starting the System

5.1. Bare Metal System Start

The start command starts LeanXcale:

unix$ lx start
start...
atlantis [
    cfgile: /ssd/leandata/xamplelx/lib/lxinst.conf...
    bin/spread -c lib/spread.conf ...
    forked bin/spread...
    bin/spread: started pid 1056053
    bin/kvms -D 192.268.1.224!9999 /ssd/leandata/xamplelx/disk/kvms100/kvmeta ...
    forked bin/kvms...
    ...
]
atlantis [
    kvds103 pid 1056084 alive
    kvms100 pid 1056057 alive
    spread pid 1056053 alive
    kvds102 pid 1056075 alive
    kvds100 pid 1056062 alive
    kvds101 pid 1056066 alive

]
unix$

Here, atlantis started a few processes and, once done, the start command checked out if the processes are indeed alive.

In case not all components can be started successfully, the whole LeanXcale system is halted by the start command.

By default not watcher or automatic restart is setup. Using flag -r asks start to start the system asking it to restart any QE that was running, failed, and was not restarted less than one minute ago.

using flag -w asks start to start lx watch. The watch tool will wait until the system becomes operational and, upon failures, try to restart the whole system.

To start a single host or component, use its name as an argument, like in:

# start the given host
unix$ lx start atlantis
# start the named components
unix$ lx start kvds
# start the named components at the given host
unix$ lx start atlantis kvds

Start does not wait for the system to be operational. To wait until the system is ready to handle SQL commands, the status command can be used with the -w (wait for status) flag, as in:

unix$ lx status -w running
status: running

Without the -w flag, the command prints the current status, which can be stopped, failed, running, or waiting.

5.2. Docker System Start

To start LeanXcale installed on Docker containers, you must start the containers holding the installed system components.

For example, consider the default docker install

unix$ lxinst docker
...
install done
docker images:
REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
uxbase       2         7c8262008dac   3 months ago   1.07GB
lx           2         cafd60d35886   3 seconds ago   2.62GB

docker network:
NETWORK ID     NAME      DRIVER    SCOPE
a8628b163a21   lxnet     bridge    local
to start:
	docker run -dit --name lx1 --network lxnet lx:2 lx1

The install process created a docker image named lx:2, installed for the docker host lx1, and the docker network lxnet.

To list the image we can

unix$ docker images lx
REPOSITORY   TAG       IMAGE ID       CREATED              SIZE
lx           2         75b8c9ffa245   About a minute ago   2.62GB

And, to list the networks

unix$ docker network ls
NETWORK ID     NAME      DRIVER    SCOPE
a8628b163a21   lxnet     bridge    local

The created image is a single one for all containers. The name given when creating the container determines the host name used The install process specified the host names, and containers must be starte using the corresponding host name(s), so they know which leanXcale host they are for.

For example, to start the container for lx1:

unix$ docker run -dit --name lx1 --network lxnet -p0.0.0.0:14420:14420 lx:2 lx1
b28d30702b80028f8280ed6c55297b2e203540387d3b4cfbd52bc78229593e27

In this command, the container name is lx1, the network used lxnext, and the image used lx:2. The port redirection -p…​ exports the SQL port to the underlying host.

Listing docker processes shows now the running container

unix$ docker ps
CONTAINER ID   IMAGE     COMMAND             STATUS          PORTS  NAMES
e81d9d01f40a   lx:2      "/bin/lxinit lx1"   Up 56 seconds   14410  lx1

It is important to know that:

  • starting the container will start leanXcale if a valid license was installed;

  • stopping the container should be done after stopping leanxcale in it.

The container name (lx1) can be used to issue commands. For example,

unix$ docker exec -it lx1 lx version
leanXcale v2.1 unstable
	kv         v2.1.2.14-02-15.c26f496706918e610831c02e99da3676a1cffa47
	lxhibernate v2.1.2.14-02-07.f65c5a628afede27c15c77df6fbbccd6d781d3ee
	TM         v2.1.2.14-02-06.bfc9f92216481dd05f51900ac522e5ccfb6d2555
	QE         v2.1.2.14-02-15.4a8ff4200dc3d3656c8469b6f74c05a296fbdfb3
	avatica    v2.1.2.14-02-14.1c442ac9e630957ace3fdb5c4faf92bb85510099
	...

executes lx version on the lx1 container.

The status for the system can be seen in a similar way:

unix$ docker exec -it lx1 lx status
status: running

Note that the container will not start the DB if no valid license is found.

5.3. AWS System Start

To start LeanXcale installed on AWS, you must start the AWS instances holding the installed system components.

This can be done by hand using the AWS console, or using the lxaws command.

For example, after installing using xample as an AWS tag, this command starts the instances:

unix$ lxaws -start xample

Once instances are started, the lx command is available at any of them.

For example, provided the PEM file can be found at xample.pem, we can run this:

unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx version

to see the installed version. Here xample1 is the DNS host name registered as the first host for the install AWS tag xample. In the same way, xample2 would be the name for the second host, and so on.

6. Bare Metal Starts at LeanXcale

When using replication, you can start one of the replicas (1 or 2):

unix$ lx start repl1

This addressing scheme can be used in many other commands. See the reference for the lx command later in this document for a full description of addressing.

7. Checking System Status

7.1. Bare Metal System Status

The command lx status reports the status for the system or waits for a given status. For example,

unix$ lx status
status: waiting
	kvds100: recovering files
	kvds101: recovering files

Or, to wait until the status is running:

unix$ lx status -v -w running
status: waiting
	kvds100: recovering files
	kvds101: recovering files
status: running
unix$

To see the status for each one of the processes in the system, use lx procs. For example:

unix$ lx procs
procs...
atlantis [
    kvds103 pid 1057699 alive running
    kvms100 pid 1057672 alive running
    spread pid 1057668 alive
    kvds102 pid 1057690 alive running
    kvds100 pid 1057677 alive running
    kvds101 pid 1057681 alive running

]

7.2. Docker System Status

Before looking at the LeanXcale system status, it is important to look at the status of the docker containers running LeanXcale components.

unix$ docker ps
CONTAINER ID   IMAGE     COMMAND             STATUS          PORTS  NAMES
e81d9d01f40a   lx:2      "/bin/lxinit lx1"   Up 56 seconds   14410  lx1

When containers are running, the command lx status reports the status for the system or waits for a given status. For example,

unix$ docker exec -it lx1 lx status

executes lx status on the lx1 container. The status is reported for the whole system, and not just for that container.

To wait until the status is running:

unix$ docker exec -it lx1 lx status -v -w running
status: waiting
	kvds100: recovering files
	kvds101: recovering files
status: running

To see the status for each one of the processes in the system, use lx procs. For example:

unix$ docker exec -it lx1 lx procs
procs...
atlantis [
    kvds103 pid 1057699 alive running
    kvms100 pid 1057672 alive running
    spread pid 1057668 alive
    kvds102 pid 1057690 alive running
    kvds100 pid 1057677 alive running
    kvds101 pid 1057681 alive running

]

7.3. AWS System Status

Before looking at the LeanXcale system status, it is important to look at the status of the AWS instances running LeanXcale components.

This can be done using the lxaws -status flag with the installed AWS tag name:

unix$ lxaws -status xample
#xample.aws.leanxcale.com:
	inst i-02bbf1473c01ea6ae	xample2.aws.leanxcale.com	stopped
	inst i-05e0708c0e4965ef0	xample1.aws.leanxcale.com	54.84.39.77	running

When instances are running, the command lx status reports the status for the system or waits for a given status. For example,

unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx status

to see the system status.

To wait until the status is running:

unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx status -v -w running
status: waiting
	kvds100: recovering files
	kvds101: recovering files
status: running

To see the status for each one of the processes in the system, use lx procs. For example:

unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx procs
procs...
atlantis [
    kvds103 pid 1057699 alive running
    kvms100 pid 1057672 alive running
    spread pid 1057668 alive
    kvds102 pid 1057690 alive running
    kvds100 pid 1057677 alive running
    kvds101 pid 1057681 alive running

]

8. Stopping the System

8.1. Bare Metal System Stop

The stop command halts LeanXcale:

unix$ lx stop
stop...
atlantis [
    kvcon[1056801]: halt
]
atlantis [
    term 1056062 2056066 1056075 1056084 1056057 1056053...
    kill 1056062 2056066 1056075 1056084 1056057 1056053...

]
unix$

8.2. Docker System Stop

Stopping the LeanXcale containers should be done after stopping leanxcale. The reason is that docker might timeout the stop operation if the system is too busy updating the disk during the stop procedure.

To stop the database,

unix$ docker exec -it lx1 lx stop

stops the components for the whole system (being lx1 an installed container).

We can double check this

unix$  docker exec -it lx1 lx status
status: stopped

Once this is done, we can stop the docker container.

unix$ docker ps
CONTAINER ID   IMAGE     COMMAND             STATUS          PORTS  NAMES
e81d9d01f40a   lx:2      "/bin/lxinit lx1"   Up 56 seconds   14410  lx1
unix$ docker stop lx1
lx1
unix$ docker ps
unix$

We can also remove the container, but, note that doing this removes all data in the container as well.

unix$ docker rm lx1
lx1
unix$

8.3. AWS System Stop

To stop leanXcale on AWS, you must stop leanXcale before stopping the AWS instances running it. For example

unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx stop

This stops the system on all the instances it uses.

Then, the instances can be stopped. This can be done on the AWS console, or using the lxaws -stop flag with the installed AWS tag name:

unix$ lxaws -stop xample
#xample.aws.leanxcale.com:
	inst i-02bbf1473c01ea6ae	xample2.aws.leanxcale.com	stopping
	inst i-05e0708c0e4965ef0	xample1.aws.leanxcale.com	stopping

9. System Recovery

The lxmeta component watches the status for other components and will stop the system when there is a failure that cannot be recovered online.

Should the system crash or fail-stop, upon a system restart, lxmeta will guide the system recovery.

At start time, each system component checks out its on-disk information and decides to start either as a ready component or as a component needing recovery.

The lxmeta process, guides the whole system start process following these steps:

  • Wait for all required components to be executing.

  • Look up the component status (ready/recovering).

  • If there are components that need recovery, their recovery process is executed.

  • After all components are ready, the system is made available by accepting queries.

The command lx status can be used both to inspect the system status and the recovering process, or to wait until the recovery process finishes and the system becomes available for queries.

10. Distributed and Replicated Installs

In general, the system is used the same way it is used when installing using a single host. Refer to the section for the kind of install of interest to learn how to start, stop, and operate the system before reading this section.

As described in the reference manual, most commands take arguments to select particular replicas, hosts, or components. And this is the case for start and stop commands. On replicated installs it is important to start and stop the whole system.

Starting the whole system checks out that replicas are synchronized and takes care of updating outdated metadata of a previously failing or stopped replica.

If a replica is not reachable and start cannot ensure that the system would start with the most recent metadata, the system will not start.

On distributed and replicated installs it is possible to ask start to proceed with just a single replica or a single host or set of components. This is done by calling start with arguments that name just a replica (or perhaps a host or a set of components).

On replicated installs, two useful names are repl1 and repl2, to ask a command to operate on the first or the second replica.

By convention, the first replica is the set of hosts configured that are not mirrors, and the second replica is the set of hosts that are mirrors of former hosts.

As an example, we use this configuration file

# lxinst.conf
host blade110
	kvds
host blade161
	mirror blade110

In this case, the first replica is just blade110 and the second replica is just blade161.

Installing the system on bare metal is done using

unix$ lxinst -f lxinst.conf

To start the system we execute

unix$ lx start
start...
blade110 [
	bin/spread -c lib/spread.conf ...
	forked bin/spread...
	...
]
blade161 [
	bin/spread -c lib/spread.conf ...
	forked bin/spread...
	...
]
unix$

We can ask for the system status or wait for a status as usual:

unix$ lx status
status: running

To stop the system:

unix$ lx stop
stop...
blade110: [
	stop: term lxqe100.r1 pid 460953
	stop: term lxmeta100.r1 pid 460932
	stop: term kvds100.r1 pid 460927
	stop: term kvms100.r1 pid 460923
	stop: term spread pid 460919
]
blade161: [
	stop: term lxqe100.r2 pid 250984
	stop: term lxmeta100.r2 pid 250959
	stop: term kvds100.r2 pid 250955
	stop: term kvms100.r2 pid 250950
	stop: term spread pid 250946
]

10.1. Partial Starts and Stops

When using multiple hosts and replication, it is possible to start and stop individual hosts or replicas and force the system to run using just those.

For example,

unix$ lx stop repl1
stop...
blade110: [
	stop: term lxqe100.r1 pid 443056
	stop: term lxmeta100.r1 pid 443035
	stop: term kvds100.r1 pid 443030
	stop: term kvms100.r1 pid 443026
	stop: term spread pid 443022
]

stops the processes in replica-1.

To start it again, we can proceed in a similar way:

unix$ lx start repl1
blade110 [
	bin/spread -c lib/spread.conf ...
	forked bin/spread...
	bin/spread: started pid 446756
	...
]
blade110 [
	kvds100.r1	pid 446764	alive disk open
	spread	pid 446756	alive
	lxmeta100.r1	pid 446769	alive starting
	lxqe100.r1	pid 446790	alive
	kvms100.r1	pid 446760	alive
]

Stopping a replica while the system is running is strongly discouraged. Using it again requires restoring the replica state to make it work with the rest of the system.

In this example, if the whole system was running when lx stop repl1 was used, starting repl1 again will reintegrate it into the system if possible.

However, if we have a fully stopped system, and run

unix$ lx start repl1

the system will run just the first replica. This will happen even if the second replica is not reachable and there is no way to ensure that the metadata in the first replica is up-to-date.

To ensure that the metadata is up-to-date in partial starts, use flag -c. This performs the same checks made when starting the whole system, and ensures that metadata is up to date, before attempting an start of the named replica or components.

10.2. System Status and Replication

The command lx status reports the status for the system or waits for a given status, as dsecribed for other installs in this document. For example,

unix$ lx status
status: running

To see the replication mirror status for the system, use flag -r

unix$ lx status -r
status: running
replica: ok

And, to see detailed information about components, use flag -p, perhaps in addition to -r:

unix$ lx status -rp
status: running
replica: ok
kvds100.r1 mirror alive snap 1173999 running
kvds100.r2 mirror alive snap 1173999 running
kvms100.r1 mirror alive
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 1173999 running
lxmeta100.r2 mirror alive running
lxqe100.r1 mirror alive snap 929999 running
lxqe100.r2 mirror alive snap 916999 running

In this example, all components are running with their mirror set as ok.

When some components failed, or part of the system was stopped, we can see a different output.

For example, after

unix$ lx stop repl1

We can see

unix$ lx status -rp
status: running
replica: single
kvds100.r2 single alive snap 228678999 single running
kvds100.r1 outdated stopped snap 228161999
kvms100.r1 mirror stopped
kvms100.r2 mirror alive
lxmeta100.r2 mirror alive snap 362692999 running
lxmeta100.r1 mirror stopped snap 108718999
lxqe100.r2 mirror alive snap 362664999 running
lxqe100.r1 mirror stopped snap 227825999

The first thing to note here is that the replica is not ok, but single. This means we have single processes (without their mirrors) and the system is running in degraded mode.

Also, kvds100.r2 status with respect to replication is single. This means that it was running while its peer (kvds100.r1, in the first replica) was stopped.

This server will not be used again until it has been brought up to date with respect to the rest of the system.

Note how kvds100.r1 status with respect to replication is outdated. This means it passed away (failed or halted) while its peer was still in use.

The same happens to lxqe100.r2, but in this case both query engines could synchronize their mirrors after restarting lxqe100.r1 and nothing else was needed to permit the restarted process work with the rest of the system.

10.3. System Metadata and Replication

To inspect the status for a replicated system it is useful to look at DB metadata as stored on disk. On replicated systems the whole system is using a master metadata server (kvms), which synchronizes metadata with its mirror server.

Looking at the disk information may aid in diagnosing the state for the system when some replica is not running, or has been retired from service.

The dbmeta command can be used to do this. For example, after running

unix$ lx start repl1

on a replicated system previously halted, we can see

lx status -rp
status: running
replica: ok
kvds100.r1 mirror alive snap 163191999 running
kvds100.r2 mirror stopped snap 110005999
kvms100.r1 mirror alive
kvms100.r2 mirror stopped
lxmeta100.r2 mirror stopped snap 497103000
lxmeta100.r1 mirror alive snap 163670999 running
lxqe100.r1 mirror alive snap 163422999 running
lxqe100.r2 mirror stopped snap 109917999

The system has not been used, so the mirror status is still ok. The output for dbmeta returns what is known by the running kvms server:

unix$ lx dbmeta /srv
dbmeta...
# kvms100.r1
kvms blade110!14400
kvms blade161!14400
kvds ds100.r1 at blade110!14500  snap 172793999 rts 112774999
lxmeta mm100.r2 at blade161!14410  snap 497103000
kvds ds100.r2 at blade161!14500  snap 110005999 rts 109069999
lxmeta mm100.r1 at blade110!14410  snap 172793999
lxqe qe100.r1 at blade110!14420  snap 173016999
lxqe qe100.r2 at blade161!14420  snap 109917999

We asked just for metadata of servers using the /srv resource path name.

The interesting part is that we can ask for metadata as known by both replicas:

unix$ lx dbmeta -a /srv
dbmeta...
blade110:
	# kvms100.r1
	kvms blade110!14400
	kvms blade161!14400
	kvds ds100.r1 at blade110!14500  snap 174236999 rts 112774999
	lxmeta mm100.r2 at blade161!14410  snap 497103000
	kvds ds100.r2 at blade161!14500  snap 110005999 rts 109069999
	lxmeta mm100.r1 at blade110!14410  snap 174236999
	lxqe qe100.r1 at blade110!14420  snap 174457999
	lxqe qe100.r2 at blade161!14420  snap 109917999
blade161:
	# kvms100.r2
	kvms blade161!14400
	kvms blade110!14400
	kvds ds100.r1 at blade110!14500  snap 110005999 rts 109069999
	kvds ds100.r2 at blade161!14500  snap 110005999 rts 109069999
	lxmeta mm100.r2 at blade161!14410  snap 497103000
	lxmeta mm100.r1 at blade110!14410  snap 110005999
	lxqe qe100.r1 at blade110!14420  snap 109920999
	lxqe qe100.r2 at blade161!14420  snap 109917999

It can be seen how replica-2 (that for kvms100.r2) is way out of date at least with respect to snapshots. This is not a surprise because it is stopped.

We can ask for the full metadata using

unix$ lx dbmeta -a

or for that for a particular table or index.

10.4. Failures

When there is a failure, the system continues to operate using the mirror processes that remain alive.

Here we describe example failures, and provide details about reparing specific failed components. Then we describe how to use lx restore to try to restore things in a more convenient way.

For example, if qe100.r2 fails (we killed it to make it so), this can be seen:

unix$ lx status
status: running with failures

Further details are reported by flags -r (replication) and -p (process):

unix$ lx status -rp
status: running with failures
replica: ok
kvds100.r1 mirror alive snap 232084999 running
kvds100.r2 mirror alive snap 232084999 running
kvms100.r1 mirror alive
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 232564999 running
lxmeta100.r2 mirror alive running
lxqe100.r1 mirror alive snap 232416999 running
lxqe100.r2 mirror dead snap 228005999

Component lxqe100.r1 is alive and running, and lxqe100.r2 is dead.

Using now the database produces a change in status:

lxqe100.r1 single alive snap 267003999 single running
lxqe100.r2 outdated dead snap 228005999

This means that lxqe100.r1 is known to be single, i.e., it has been used while its mirror was dead or halted.

Also, lxqe100.r2 is known to be outdated, i.e., its mirror has been used while it was dead or halted.

10.5. Restoring LXMETA Failures

Recovering from lxmeta failures is trivial because the component simply rebuilds its state from the running system.

10.6. Restoring KVMS Failures

Recovering from kvms failures requires making sure that when the system starts, the new master has the most recent metadata on disk.

kvms servers kept the metadata synchronized and there is nothing special to be done to recover them from a failure as long as they can reach the current master server or the disk data for the new master is up to date.

For example, if the master kvms dies, we see this as the status:

unix$ lx status -rp
status: running with failures
replica: ok
kvds100.r1 mirror alive snap 100079999 running
kvds100.r2 mirror alive snap 99122999 running
kvms100.r1 mirror dead
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 100079999 running
lxmeta100.r2 mirror alive running
lxqe100.r1 mirror alive snap 99912999 running
lxqe100.r2 mirror alive snap 99449999 running

The system continues to operate using kvms100.r2.

To recover at this point, it suffices to restart the failed kvms:

unix$ lx start kvms100.r1

It takes its state from kvms100.r2, which is the current master, and the system is ok.

However, if the system stops before updating kvms100.r1 disk with the, possibly newer metadata, it can be that the next restart it will be the new master and also have old data, leading to problems.

When used to start the whole system, lx start takes care of updating the metadata on disk for kvms components with that from the previous master (the one with newest timestamps in it).

When starting components by hand or by individual hosts, the kvms data should be updated on disk for the replicas with the data from the newest one.

That is, unless the server keeping the previous kvms master is unreachable, running

unix$ lx start

suffices to use up to date metadata despite previous kvms failures.

But, running explicitly kvms100.r1 after its failure, without updating its disk metadata, with the rest of the system stopped or unreachable, will lead to an execution with old system metadata, leading to problems.

This is only important when there are failures or explicit stops for parts of the system. Otherwise, all metadata is synchronized and there should be no problem.

10.7. Restoring QE Replica Failures

Doing a system stop, and a system start, should recover failed query engines without doing anything else. But this is not always possible or desirable.

In the case of query engine failures, we can recover them while the system is running by starting the failed query engine.

unix$ lx start lxqe100.r2

Asking for the status for replica and processes yields:

unix$ lx status -rp
status: running
replica: ok
kvds100.r1 mirror alive snap 309358999 running
kvds100.r2 mirror alive snap 309358999 running
kvms100.r1 mirror alive
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 309358999 running
lxmeta100.r2 mirror alive running
lxqe100.r1 mirror alive snap 309900999 running
lxqe100.r2 mirror alive snap 309253999 running

If the system has been running for too long while a query engine was stopped, it may be better to recover the failed query engine disk state from the mirror alive.

This command recreates the disk for the replica of lxqe100.r2 from the disk of lxqe100.r1. The argument order is similar to cp, i.e., source and then destination.

unix$ lx copy lxqe100.r1 repl
copy blade110 lxqe100.r1 blade161 lxqe100.r2

Here, it is important to use repl as the (source or) target argument name, to let copy know that it is updating a replica.

Use lx copy with care. It blindly copies the disk from one component to another, and you might overwrite data in the process.

Flag -n makes the command report what it would do, and it is sensible to use it before actually copying anything. More details are printed using flag -v:

unix$ lx copy -nv lxqe100.r1 repl
#disk...
copy blade110 lxqe100.r1 blade161 lxqe100.r2
copy local disk/lxqe100.r1/log blade161 disk/lxqe100.r2/logm
copy local disk/lxqe100.r1/logm blade161 disk/lxqe100.r2/log

Here, log is the log data and logm is the mirror kept for the mirror QE, they are exchanged after the copy to make lxqe100.r2 a replica for lxqe100.r1.

10.8. Restoring KVDS Replica Failures

In the case of a kvds failure, bringing it back into operation requires reparing it by updating its data with that from its mirror.

For example, using our example install and killing kvds100.r2 leads to this status:

unix$ lx status -rp
status: running with failures
replica: single
kvds100.r1 single alive snap 5735999 single running
kvds100.r2 outdated dead snap 3839999
kvms100.r1 mirror alive
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 6688999 running
lxmeta100.r2 mirror alive running
lxqe100.r1 mirror alive snap 6602999 running
lxqe100.r2 mirror alive snap 6599999 running

Here, kvds100.r1 continued to run (has a single status) and kvds100.r2 failed while its mirror was running (has an outdated status). The whole system replication status is now single, because of this.

Starting kvds100.r2 by hand does not change things. Its state is probably out of date with respect to its mirror, and it is not integrated into the running system until restored.

unix$ lx start kvds100.r2
...
unix$ lx status -rp
status: running
replica: single
kvds100.r1 single alive snap 12421999 single running
kvds100.r2 outdated alive snap 13379999 disk open
kvms100.r1 mirror alive
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 13379999 running
lxmeta100.r2 mirror alive running
lxqe100.r1 mirror alive snap 13282999 running
lxqe100.r2 mirror alive snap 13281999 running

Stopping the system still preserves the status for the single and failed server:

unix$ lx stop
...
unix$ lx dbmeta -a /srv
dbmeta...
blade110:
	# kvms100.r1
	kvms blade110!14400
	kvms blade161!14400
	lxmeta mm100.r2 at blade161!14410
	kvds ds100.r1 at blade110!14500 single snap 16247999
	kvds ds100.r2 at blade161!14500 fail snap 16247999
	lxmeta mm100.r1 at blade110!14410  snap 16247999
	lxqe qe100.r1 at blade110!14420  snap 16147999
	lxqe qe100.r2 at blade161!14420  snap 16146999
blade161:
	# kvms100.r2
	kvms blade161!14400
	kvms blade110!14400
	lxmeta mm100.r2 at blade161!14410
	kvds ds100.r1 at blade110!14500 single snap 16247999
	kvds ds100.r2 at blade161!14500 fail snap 16247999
	lxmeta mm100.r1 at blade110!14410  snap 16247999
	lxqe qe100.r1 at blade110!14420  snap 16147999
	lxqe qe100.r2 at blade161!14420  snap 16146999

Restarting the system at this point will force a recovery from the DB log data, because there was a single kvds and its mirror must be updated before it can run user operations.

unix$ lx start
...
unix$ lx status -rp
status: running
replica: ok
kvds100.r1 mirror alive snap 18023999 running
kvds100.r2 mirror alive snap 18023999 running
kvms100.r1 mirror alive
kvms100.r2 mirror alive
lxmeta100.r1 mirror alive snap 18023999 running
lxmeta100.r2 mirror alive running
lxqe100.r2 mirror alive snap 17930999 running
lxqe100.r1 mirror alive snap 17919999 running

When the kvds server has been down for too long, it is better to restore its disk from its surviving mirror before restarting the system.

Consider the system after kvds100.r2 failed, and it was stopped using lx stop. Instead of using lx start directly, we can use lx copy to restore the obsolete and failed kvds100.r2:

unix$ lx copy kvds100.r1 repl
copy blade110 kvds100.r1 blade161 kvds100.r2

Copying a kvds disk can take a long time.

10.9. Using the recover command

The command lx recover inspects metadata and tries to restore the disk for failed components.

It can be used before starting the system, to let lx start start an already restored replicated system.

To make an example, with the example system running, we killed both kvds100.r1 and lxqe100.r2 and stopped the system after that. This is the resulting server metadata:

unix$ lx dbmeta /srv
dbmeta...
# kvms100.r1
meta ts 15264000
kvms blade110!14400
kvms blade161!14400
lxmeta mm100.r2 at blade161!14410
kvds ds100.r1 at blade110!14500 fail snap 2101999
kvds ds100.r2 at blade161!14500 single snap 14804999
lxmeta mm100.r1 at blade110!14410  snap 15264000
lxqe qe100.r1 at blade110!14420 single snap 14736999
lxqe qe100.r2 at blade161!14420  snap 925999

A dry run (flag -n) for lx recover shows this:

unix$ lx recover -n
#disk...
copy blade110 kvds100.r2 blade161 kvds100.r1
clear kvds100.r2 flag single
clear kvds100.r1 flag fail
copy blade110 lxqe100.r1 blade161 lxqe100.r2
clear lxqe100.r1 flag single
clear lxqe100.r2 flag fail

We can use lx copy and lx dbmeta -w to copy disks and clear flags, but it is more convenient to run this command without the dry run flag once we are decided to do so.

The recover command can take arguments to select what should be recovered, following the style of most other commands. For example:

lx recover kvds100
#recover...
copy blade110 kvds100.r2 blade161 kvds100.r1
clear kvds100.r2 flag single
clear kvds100.r1 flag fail

After a recover, the system can be started normally.

11. Running SQL Queries

To use the database with the standard SQL client, you can use this command supplying the user name and secret:

unix$ lx sql -n lxadmin -p ****
lx% !tables
...

It is suggested not to give the password in the command line, but to open the connection later within the sql prompt, to prevent listing the password in the process list for the underlying password. This was just an example.

The isolation mode can be set using the t argument, as in:

unix$ lx sql -Msession -n lxadmin -p **** -t rawrc
lx% !tables
...

LeanXcale isolation levels are (from strongest to weakest):

  • snapshot_isolation or si: standard snapshot isolation, that is, when the transaction starts gets the current snapshot and all reads are performed over that snapshot.

  • read_committed or rc: each SQL statement gets the current snapshot and reads are performed from that snapshot.

  • raw_read_committed or rawrc: each read performed by an SQL statement get the latest committed value.

  • loader: Special isolation level to accelerate database loads. It is like read committed but it does not perform conflict detection nor logging. This can only be done without any other transactions accessing the database.

Communication between the SQL client and the query engine(s) is not encrypted by default. Install using the tls property to ask for encrypted communications.

The lx sql command does this on its own, but, when using a connection URL on a standard JDBC client, add the

tls=yes

property to the connection property set. This tells the leanXcale client driver to use TLS.

It is possible to supply the isolation mode on the URL using

mode=read_committed

as a property in the URL. But note that when this is used in lx sql, the console will still overwrite the URL mode with either its default mode or the one set using the -t flag.

It is possible to supply extra addresses when multiple servers are available, for load-balancing and for HA. The lx sql command does this on its own, but, you may supply extra addresses in the xaddr property separated by commas. For example

xaddr=host2:14420,host3:14420

adds two extra server addresses besides the one given in the URL.

11.1. Using DBeaver

To use dbeaver with leanXcale, just use the standard JDBC driver and configure the URI to access the installed system.

For example, download the JDBC for this version from Mvn central leanXcale drivers. Usually, you want the last driver. As of now, it is the leanXcaleJDBC driver 3.2. Then follow these steps:

  • Add the driver at Data Base, Driver manager

  • At settings, use com.leanxcale.client.Driver as the driver

  • At settings, use jdbc:leanxcale://{host}:{port}/{database} as the URL template.

  • At settings, use 14420 as the default port

  • At settings, use db as the default database.

  • At the libraries tab, use add file and use the path to the downloaded JDBC driver.

image
image
image
image
image

At this point you can add the connection using the DB type just configured and the user and password as needed.

12. Configuring the System

The lx config command prints or updates the configuration used for the LeanXcale system:

unix$ lx config
cfgile: /usr/local/leanxcale/lib/lxinst.conf...
#cfgfile /usr/local/leanxcale/lib/lxinst.conf
host localhost
    lxdir /usr/local/leanxcale
    JAVA_HOME /usr/lib/jvm/java-1.11.0-openjdk-amd64
    addr 127.0.0.1
    odata 100
        addr localhost!14004
    kvms 100
        addr 127.0.0.1!14400
    lxmeta 100
        addr 127.0.0.1!16500
    lxqe 100
        addr 127.0.0.1!16000
    kvds 100
        addr 127.0.0.1!15000
    kvds 101
        addr 127.0.0.1!15002
    kvds 102
        addr 127.0.0.1!15004
    kvds 103
        addr 127.0.0.1!15006

The configuration printed provides more details than the configuration file (or command line arguments) used to install.

NB: when installing into AWS or docker, the configuration printed might lack the initial aws or docker property used to install it.

It is possible to ask for particular config entries, like done here:

unix$ lx config kvms addr

It is also possible to adjust configured values, like in

unix$ lx config -s lxqe mem=500m

used to adjust the lxqe mem property to be 500m in all lxqe components configured.

13. System Logs

Logs are kept on a per-system directory, named log, kept at the install directory for each system.

Log files have names similar to

	kvms100.240214.1127.log

Here, the component name comes first, and then, the date and time when the log file was created. When a log file becomes too big, a new one is started for the given component.

It is convenient to use the lx logs command to list and inspect logs. It takes care of reaching the involved log files on the involved hosts.

For example, to list all log files:

	unix$ lx logs
	logs...
	atlantis: [
		log/kvds100.240214.1127.log	250.00K
		log/kvms100.240214.1127.log	35.39K
		log/lxmeta100.240214.1127.log	24.10K
		log/lxqe100.240214.1127.log	445.80K
		log/spread.240214.1127.log	963
		log/start.log	426
	]

Here, atlantis was the only system installed.

We can give host and/or component names as in many other commands to focus on those systems and/or components.

For example, to just just logs for kvms processes:

	unix$ lx logs kvms
	logs...
	atlantis: [
		log/kvms100.240214.1127.log	35.39K
	]

Or, to list only those for the kvms100:

	unix$ lx logs kvms100

To list logs for the atlantis host:

	unix$ lx logs atlantis

To list logs for kvms components within atlantis:

	unix$ lx logs atlantis kvms

To list logs for kvms components at atlantis and kvds at orion:

	unix$ lx logs atlantis kvms orion kvds

With flag -p, logs are printed in the output instead of being listed.

	unix$ lx logs -p lxmeta
	atlantis: [
		log/lxmeta100.240214.1127.log	24.10K [
			# pid 3351755 cmd bin/javaw com.leanxcale.lxmeta.LXMeta -a atlantis!14410
	...

Flag -g greps the logs for lines with the given expression. For example:

	unix$ lx logs -g fatal

And, flag -c copies the logs to the given directory

	unix$ lx logs -c /tmp

When printing and copying the logs, only the last log file for each component are used. To operate on all the logs and not just on the last one, use flag -a too:

	unix$ lx logs -a -c /tmp

14. Logs and Diagnostics

In this section we describe some information that can be found in the system log files.

14.1. Authentication Diagnostics

The lxqe log files include authentication information that can be checked out when in doubt.

Lines like

2024-02-16 08:45:08,752 INFO lxqe: authenticate: lxadmin: local: yes

report that the user lxadmin (or whoever it was) was authenticated by the local (i.e., the DB) user entry.

When using LDAP, the line would be

2024-02-16 08:45:08,752 INFO lxqe: authenticate: db-USR1: ldap: yes

This reports both the database used (db) and the user involved (USR1).

Authentication errors are reported in a similar way:

2024-02-16 08:48:09,992 INFO lxqe: authenticate: db-USR2: ldap: no:
	[LDAP: error code 49 - Invalid Credentials]

We folded the line to make it easier to read.

14.2. Login and Auditing

The audit log file contains auditing entries, including user authentication.

These can be enabled using the AUDIT SQL statement, although authentication auditing is always enabled and cannot be disabled.

For example, to locate authentication entries for the user lxadmin the flag -g (for grep) can be used as shown here

	unix$ lx logs -g auth.*lxadmin audit
	logs...
	atlantis: [
		log/audit.240221.0708.log	1.75K [
		1:	2024-02-21 07:08:10 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		2:	2024-02-21 07:08:12 lxqe100: audit: auth: db-USR2: 127.0.0.1: local: yes
		3:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		4:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		5:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		...
		]
	]

The user names include both the database and the user name, as a convenience. An exception to this rule is lxadmin.

Authentication failures are reported with lines like

2024-02-29 06:40:52 lxqe100: audit: auth: db-FOO: 127.0.0.1:39018: local: no:auth failed

If user or table auditing are enabled on a per-session basis, read, write, and change (alter) accesses are reported once per session. For example, after executing any of these:

	AUDIT TABLE MYTBL
	AUDIT USER USR1
	AUDIT USER USR1 TABLE MYTBL

the audit log file will include reports like:

2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: /db/USR1/tbl/MYTBL

More expensive auditing reports the statements executed (even if they fail), and can be enabled using, for example

	AUDIT USER USR1 BY ANY STATEMENT

Although it is more sensible to audit just READ, WRITE, or DDL statements instead of auditing all of them.

In this case, the audit log will contain lines like the following, and it may grow quickly.

2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: SELECT count(*) FROM MYTBL

When permission is denied to execute a statement, an audit record is added if any audit has been enabled.

2024-02-29 06:40:51 lxqe100: audit: perm: db-TEST1: /db/APP/tbl/PERSONS:
	select on table: permission denied

The line has been folded for readability, although it is a single line in the audit log.

It is important to know that the lxqe log file reports failed permission checks even when auditing is disabled. There is no need to enable auditing just to search for failed permission checks.

15. Backups

Backups can be made to a external location (recommended to tolerate disk failures) or within an installed host. External backups (i.e., to an external location) are made using the lxbackup tool, installed at the bin directory, which works with a given configuration file. Internal backups (i.e., to a directory on installed hosts) are made using the lx backup tool instead.

usage: lxbackup [-h] [-v] [-D] [-n] [-f cfgfile] [-d dir] [-i] [-e] [-r] [-o]
                [-F]
                [what [what ...]]

make a backup

positional arguments:
  what        host/comps

optional arguments:
  -h, --help  show this help message and exit
  -v          verbose
  -D          enable debug diags
  -n          dry run
  -f cfgfile  config for the install
  -d dir      root backup dir
  -i          incremental backups
  -e          encrypt
  -r          restore
  -o          online backup
  -F          force command

Using lxbackup is exactly like using lx backup with a few differences:

  • Flag -f is mandatory on the first invocation and provides the installed configuration file.

  • The default backup directory is not $LXDIR/dump, but ./lxdump.

  • The command kvtar must be available at the host running lxbackup.

The external host must have ssh access to the installed hosts.

The configuration file used must be the one retrieved using lx config, and not the one written by the user to perform the install, because lx config reports addresses and details needed by the command.

Once a backup has been created, the configuration used is saved along with the backup data and there is no need to use -f to supply it.

Encryption/decryption happens at the source/destination of data when creating/extracting backups. Therefore, there is no key kept at the host running lxbackup.

To prepare a host to use lxbackup, get the lxbackup command, the configuration, and the kvtar command and copy them to the host. You might want to copy also lxrestore and lxbackups.

These commands can be found at the installed $LXDIR/bin directory on any installed host. If you are not sure regarding the $LXDIR value, use this command to find it:

unix$ lx -d pwd
/usr/local/leanxcale
unix$

Flag -d for lx makes it change to the $LXDIR directory before running the given command.

The detailed configuration file to be used is kept at $LXDIR/lib/lxinst.conf. The configuration can be retrieved using the lx config command. For example, this creates ./lxinst.conf with the detailed configuration:

unix$ lx config -o lxinst.conf
saved lxinst.conf

As an example, we can setup an external host named orion to perform backups in this way:

unix$ lx -d pwd
/usr/local/leanxcale
unix$ lx config -o /tmp/lxinst.conf
saved /tmp/lxinst.conf
unix$ cd /usr/local/leanxcale/bin
unix$ scp lxbackup lxbackups lxrestore kvtar /tmp/lxinst.conf orion:~

And then just:

orion$ lxbackup -f lxinst.conf

To create a full, cold, backup when leanXcale is not running:

orion$ lxbackup -f lxinst.conf
#disk...
new 240809
make atlantis disk/kvms100 /usr/local/leanxcale/dump/240809/kvms100
make atlantis disk/kvds100 /usr/local/leanxcale/dump/240809/kvds100
make atlantis disk/kvds101 /usr/local/leanxcale/dump/240809/kvds101
make atlantis disk/lxqe100/log /usr/local/leanxcale/dump/240809/lxqe100
unix$

Flag -f must be used the first time at least. Once a backup has been made, the configuration is saved and there is no need to supply it again.

The printed name is the name for the directory keeping the backup, as used when restoring it. In this case, 240809.

The files kept at the backup are compressed, and must uncompressed if copied by hand. The restore command takes care of this.

Using flag -e both encrypts and compresses the backup files. Flags belong to disk, note that backup is the argument given to disk.

The key used to encrypt the backed files is kept at $LXDIR/lib/lxkey.pem on the installed sites as set up by the installer.

orion$ lxbackup -e
#disk...
new 24080901
make atlantis disk/kvms100 /usr/local/leanxcale/dump/24080901/kvms100 crypt
make atlantis disk/kvds100 /usr/local/leanxcale/dump/24080901/kvds100 crypt
make atlantis disk/kvds101 /usr/local/leanxcale/dump/24080901/kvds101 crypt
make atlantis disk/lxqe100/log /usr/local/leanxcale/dump/24080901/lxqe100 crypt

To perform a backup while the system is running, supply flag -o (for online).

15.1. Incremental Backups

To create an incremental backup use the flag -i of lxbackup (or the same flag with lx backup for an internal backup).

orion$ lxbackup -i
#disk...
new 24080902+
incr: lxqe100 += lxqe100

The output reports which components got files backed up.

The incremental backup is encrypted if the total backup it refers to is encrypted too. No flag -e should be given.

The incremental backup can be performed while the system is running.

15.2. Listing Backups

To list the backups known use lxbackups (or lx backups for internal backups).

orion$ lxbackups
240810 ts 1419000
24081002+ ts 1419000
...

Those with a + in their names are incremental backups.

Verbosity can be increased adding one or more -v flags:

orion$ lxbackups -v
#disk...
240810 ts 1419000
240810/kvms100 ts 2551000
240810/kvds100 ts 1292000
...
orion$ lxbackups -vv
#disk...
240810 ts 1419000
240810/kvms100 ts 2551000
240810/kvms100/kvmeta sz 239 ts 2551000 mt 1723285636
240810/kvds100 ts 1292000
240810/kvds100/dbf00001.kv sz 393216 ts 1419000 mt 1723285636
...
orion$ lxbackups -vvv
#disk...
240810 ts 1419000
	240810/kvms100 ts 2551000
		240810/kvms100/kvmeta sz 239 mt 1723285636
			localhost
			ts 1 2551000 bck 0 0 na 0
	240810/kvds100 ts 1292000
		240810/kvds100/dbf00001.kv sz 393216 mt 1723285636
			localhost
			ts 1292000 1419000 bck 0 0 na 3
			tname db-APP-PERSONS
			rmin: tpmin
			rmax: tpmax
...

It is possible to list a single backup by suppling its name and/or specific components using arguments as done with most lx commands:

orion$ lxbackups 24080901

or

orion$ lxbackups 24080901 lxqe

15.3. Removing Old Backups

To remove a backup, it suffices to remove its directory. For example:

unix$ lx -d rm -rf dump/230720

Here we used the flag “-d” for lx to change to $LXDIR before executing the remove command, which makes it easy to name the directory used for the dump.

Or, from our external backup example host:

orion$ rm -rf lxdump/230720.2

Beware that if you remove a backup, you should remove also those incremental backups that follow, up to the next total backup.

15.4. Restore

To restore a backup, use the lxrestore command (or lx restore for internal backups).

orion$ lxrestore -v
#disk...
check...
restore 24080904...
restore atlantis disk/kvds100 from /usr/local/leanxcale/dump/24080904/kvds100 crypt
dbf00001.kv
dbf00003.kv
...

Do this while the system is stopped. By default, it selects the last backup made.

To restore a particular backup, supply its name:

orion$ lxrestore 24080904

This can be done also for incremental backups. When restoring an incremental backup, the restore takes data also from previous incremental backups and from the previous total backup.

To restore only specific hosts or components, supply their names or types as done for other commands:

orion$ lxrestore lxqe

Or perhaps:

orion$ lxrestore 24080904 lxqe

To restore a backup, it is usually desirable to format the disk for the involved components before using restore to restore their disks.

15.5. Backup Automation

To automate system backups, use crontab(8) to run lx backup when backing up within a installed host, or to run lxbackup at an external host, on the desired times.

16. Asynchronous Replication from Another LeanXcale Instance

It is possible to configure an install to make it pull changes from a remote one. In this case, the pulling system will fetch transactions made from the source system and apply them as well.

Applied transactions are not subject to conflict checks and the like, because the aim is to update the target system with respect to the source one, and the source one did already perform those transactions.

The target keeps on trying to reach the source system and, when connected, pulls changes and applies them. Should there be any error during an apply, it is considered a fatal error and the query engine where that happen will stop and signal the error.

If there are more query engines, the system might still continue running, depending on how it has been configured. But it is suggested to use a single query engine on a system pulling changes.

As an example, this configuration file pulls changes from a system installed at hosts orion and rigel.

#cfgfile
host atlantis
	lxqe
		LXPULL	orion!14420;rigel!14420

To learn the addresses for the query engines of a particular install, use lx config:

orion$ lx config

It is important to add all the addresses for query engines, or some transactions in the source system might be missed.

Once running, the lx status command can show that the system is pulling. Use the flag -p to see the status for each process.

atlantis$ lx status -p
status: running
kvds100 alone alive snap 66709999 running
kvms100 alone alive
lxmeta100 alone alive snap 66709999 running
lxqe100 alone alive snap 66736999 pulling from  orion!14420 rigel!14420

Here, we can see that lxqe100 is pulling from a couple of remote query engines.

To stop pulling for a while, you can use a control request for the pulling query engine. For example:

atlantis$ lx kvcon ctl qe100 pull stop

The status line for lxqe100 should now say not pulling.

To start pulling again, there is a similar control request:

atlantis$ lx kvcon ctl qe100 pull start

To change the addresses, although another control request (not shown here) can be used, it is usually better to stop the pulling system, change the configuration, and restart it.

This can be done using lx config with the -s flag to set a property in the configuration. For example:

atlantis$ lx config -s lxqe100 'LXPULL=blade161!14424'

changes the configuration as it can be seen:

atlantis$ lx config
#cfgfile lib/lxconfig.conf
size small
host atlantis
	kvds 100
		addr atlantis!14500
		mem 1024m
	kvms 100
		addr atlantis!14400
	lxmeta 100
		addr atlantis!14410
	lxqe 100
		addr atlantis!14420
		mem 1024m
		LXPULL blade161!14424

17. Forwarding System Updates

It is possible to execute a user program to process or forward every update made to the data on the installed system.

The program processing changes must be installed in the user bin directory with the name fwd, or in as a JAR file in the user lib directory with the name fwd.jar.

The program runs with the different library, python, and class paths set to include the user lib directory, where extra libraries can be installed as well. Its current directory is the install directory at the running location. Temporary files should be created on ./tmp Any diagnostic should be printed on the standard error stream as every other command does. The program output and error is saved on log files like done for other system components, and it can be inspected with the lx log command.

The program takes the option -f (follow) to keep on following changes and the directory name where look for files named changes* reporting updates. It should remove the processed files as soon as they have been processed.

For java, the class must be exactly com.leanxcale.usr.Changes and supply a main method.

Changes files might be removed if not processed after many other changes have been reported, to avoid disk usage problems.

To forward changes, set the global fwd attribute to yes in the configuration, and the forward program to the installed system as shown later.

When the system starts, the forward program will be started for each of the installed data servers, to forward or process its changes, and such data servers will report their changes.

For example, after with a xample.jar file including a Changes class reporting changes, and with a installed and stopped system, we can:

Rename the file to fwd.jar and add it to the user library directory:

unix$ mv xample.jar fwd.jar
unix$ lx add fwd.jar ulib
cp: fwd.jar local:/usr/local/leanxcale/ulib

Here, ulib is the user library directory where to install the fwd.jar file. Should the program be an executable file, we would use instead:

unix$ lx add fwd ubin

If the system was not configured to forward changes when installed, we can enable forwarding by setting the fwd attribute with the command:

unix$ lx config -s 'fwd=yes'

This sets the global fwd property to yes in the installed configuration.

The next time the system starts, the data servers will report changes as dictated by the fwd property and the lx start command will start the forward processes by running the jwd.jar.

To stop temporarly the forwarding processes, use lx stop:

unix$ lx stop fwd

To restart forwarding, use lx start:

unix$ lx start fwd

To update the forward program stop forwarding, add the program as shown above, and start forwarding again.

18. Using MFA with PAM

Each user requiring MFA must also configure their authenticator app (for instance the Google Authenticator app for Android). In each user’s home directory, run the google-authenticator command, specifying the time-based option:

unix$ google-authenticator -t

It will create a new secret key in $HOME/.google_authenticator and display the new key both as a series of hexadecimal digits and as a QR code, to be entered into the authenticator app. When using lx sql, as well as the user name (with -n) and password (with -p) the current authenticator value should be given with the -P flag:

unix$ lx sql -n username -p password -P 123456

where 123456 should be replaced by the current code from the authenticator app.

19. Adding Hosts on LeanXcale Installs

To add more hosts to a existing install, edit the configuration file to add the extra hosts. The command:

unix$ lx config

can print the configuration if it is no longer available.

Once the new host(s) are added, run the install program to update the new hosts (and install them):

unix$ lxinst -u -f lxinst.conf

And finally, start the DB on the new host(s). For example, if we added a host named blade123, we can run:

unix$ lx start blade123

20. Reporting Issues

To report an issue, use lx report to gather system information. This program collects information from the system and builds an archive to be sent to support.

unix$ lx report
report: lxreport.231009...
version...
procs...
logs...
stacks...
stack lxmeta100...
stack kvds103...
stack kvms100...
stack spread...
stack kvds102...
stack kvds100...
stack kvds101...
stack lxqe100...

# send this file to support.
-rw-rw-r-- 1 leandata leandata 54861 Oct  9 14:58 lxreport.231009.tgz

As printed by the command output, the resulting tar file should be sent to support.

The archive includes:

  • installed version numbers

  • underlying OS names are versions

  • complete disk usage for the installed systems

  • complete process list for the installed systems

  • memory usage for the installed systems

  • lx process list

  • logs for components (last log file only, for each one)

  • stacks for each component

  • stacks for each core file found

When kvms is still running, the archive includes also:

  • statistics for the sytem

  • long list of kv resources

  • process list for each kvds

  • file list for each kvds