LeanXcale v2.4 User’s Guide
This document is the User’s guide for the leanXcale system administrator.
- 1. Background
- 2. Start & Stop Particularities on Different Installs
- 3. Licenses
- 4. Licenses at LeanXcale
- 5. Starting the System
- 6. Bare Metal Starts at LeanXcale
- 7. Checking System Status
- 8. Stopping the System
- 9. System Recovery
- 10. Distributed and Replicated Installs
- 11. Running SQL Queries
- 12. Configuring the System
- 13. System Logs
- 14. Logs and Diagnostics
- 15. Backups
- 16. Asynchronous Replication from Another LeanXcale Instance
- 17. Forwarding System Updates
- 18. Using MFA with PAM
- 19. Adding Hosts on LeanXcale Installs
- 20. Reporting Issues
1. Background
1.1. LeanXcale Components
Before installing, it is important to know that LeanXcale has a distributed architecture and it consists of several components:
-
lxqe: Query engine in charge of processing SQL queries.
-
kvds: Data server of the storage subsystem. There might be multiple instances.
-
kvms: Metadata server of the storage subsystem.
-
lxmeta: Metadata process for LeanXcale. It keeps metadata and services needed for other components.
-
stats: Optional monitoring subsystem to see resource usage and performance KPIs of LeanXcale database.
-
odata: Optional OpenDATA server to support a SQL REST API.
There are other components used by the system, that
are not relevant for the user and are not described here. For example,
spread
is a communication bus used by LeanXcale components.
1.2. LeanXcale Commands
The command lx
is a
Shell for running LeanXcale control programs. This simply fixes the
environment for the installed host and runs the command given as an
argument:
usage: lx [-d] cmd...
The command operates on the whole LeanXcale system, even when multiple hosts are used.
It is convenient to have lx
in the PATH
environment variable, as
suggested in the install program output.
Command output usually includes information on a per-host basis reporting the progress of the used command.
Most commands follow the same conventions regarding options and arguments. We describe them here as a convenience.
Arguments specify what to operate (e.g., what to start, stop, etc.) may be empty to rely on the defaults (whole DB) or may specify a particular host and/or component name:
-
when only component names are given, and only those components will be involved (e.g.,
lxqe101
). -
when a component type name is given, components for that type are selected. (e.g.,
lxqe
). -
when a host name is given, any component following is narrowed to that host. If no components follow the host name, all components from the host are selected.
This may be repeated to specify different hosts and/or components.
The special host names db
, repl
, and repl2
may be used and
stand for hosts without the nodb
attribute, hosts for the first
replica, and hosts for the second replica (hosts that are a mirror of
other ones).
1.2.1. LeanXcale Commands on Bare Metal Installs
For bare metal installs, it suffices to have the lx
command in the PATH.
It can run on any of the installed hosts.
For example, on an installed host, lx version
prints the installed version:
unix$ lx version leanXcale v2.2 kv v2.2.2023-09-29.115f5fba70e3af8dc203953399088902c4534389 QE v2.2.2023-09-30.1e5933900582.26a7a5c3420cd3d5d589d1fa6cc libs v2.2.2023-09-29.67535752acf19e092a6eaf17b11ad17597897956 avatica v2.2.2023-09-27.0b0a786b36e8bc7381fb2bb01bc8b3ed56f49172 TM v2.2.2023-09-29.9a9b22cfdc9b924dbc3430e613cddab4ed667a57
1.2.2. LeanXcale Commands on Docker Installs
To use the lx
command on a docker install, an installed container must be
running, and the command must be called on it.
For example, assume that the container named lx1
is running on a Docker install.
The container could be started using the following command,
assuming the leanXcale image is named lx:2
, and the docker network used is
lxnet
:
unix$ docker run -dit --name lx1 --network lxnet -p0.0.0.0:14420:14420 lx:2 lx1 b28d30702b80028f8280ed6c55297b2e203540387d3b4cfbd52bc78229593e27
It is possible to attach to the container and use the ''lx'' command as it can be done on a bare metal host install:
unix$ docker attach lx1 lx1$ lx version ...
Here, we type docker attach lx1
on the host, and lx version
on
the docker container prompt.
Note that if you terminate the shell reached when attaching the docker container, it will stop. Usually, this is not desired.
It is possible to execute commands directly on the executed container. For example:
unix$ docker exec -it lx1 lx version
executes lx version
on the lx1
container.
In what follows, lx1
is used as the container name in the examples for docker
installs.
1.2.3. LeanXcale Commands on AWS Installs
Using lx
on AWS hosts is similar to using it on a bare-metal install.
The difference is that you must connect on the AWS instance to run the command
there.
For example, after installing xample1.aws.leanxcale.com
, and provided the
PEM file can be found at xample.pem, we can run this:
unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx version
to see the installed version.
In what follows, xample.pem
is used as the PEM file name
and xample1.aws.leanxcale.com
is used as the installed instance name,
for all AWS install examples.
2. Start & Stop Particularities on Different Installs
System start depends on how the system has been installed. For bare-metal installations, the administrator installing the system is responsible for adding a system service that brings LeanXcale into operation when the machine starts, and stops LeanXcale before halting the system.
For AWS installations, LeanXcale is added as a service, disabled by default. Do not use the service on multi-host installs, the service starts/stops the DB and that requires DB processes to be accessible. That might not be the case with multiple instances.
When the service is enabled, starting the instance starts the LeanXcale service, and stopping the instance stops LeanXcale before the instance stops.
Otherwise, starting leanXcale requires to dial into the installed instance
(one of them) and issue the lx start
command.
For Docker installations, starting a container starts the LeanXcale service on it, and, for safety, LeanXcale should be halted before halting the container (otherwise Docker might decide to time-out and stop the container before LeanXcale did fully stop).
3. Licenses
To check for license status or to install a new license you can use the lx license
command.
For a local installation, use just
unix$ lx license license expires: Mon Dec 30 00:00:00 2024
For docker installs, each container must include its own license. The DB in the container does not start the DB unless a valid license is found. But, the container must be running to check the license status and to install new licenses. Refer to the section on starting docker containers for help on that.
For example, to list the license status for the container lx1
we can run
unix$ docker exec -it lx1 lx license lx1 [ kvcon[1380]: license: no license file failed: failed: status 1 ] failed: status 1
To install a new license, just copy the license file to the container as shown here
unix$ docker cp ~/.lxlicense lx1:/usr/local/leanxcale/.lxlicense unix$ docker exec -it lx1 sudo chown lx /usr/local/leanxcale/.lxlicense
The license status should be ok now:
unix$ docker exec -it lx1 lx license license expires: Mon Dec 30 00:00:00 2024
4. Licenses at LeanXcale
You can run the kvlicense
file to generate a license file:
unix$ kvlicense usage: kvlicense yymmdd file unix$ kvlicense 243012 lxlicense kvlicense[34951]: license saved at lxlicense with limit Fri Jun 12 2026
This binary is not built by default and is not included in the distribution Ask for help if you need it.
5. Starting the System
5.1. Bare Metal System Start
The start
command starts LeanXcale:
unix$ lx start start... atlantis [ cfgile: /ssd/leandata/xamplelx/lib/lxinst.conf... bin/spread -c lib/spread.conf ... forked bin/spread... bin/spread: started pid 1056053 bin/kvms -D 192.268.1.224!9999 /ssd/leandata/xamplelx/disk/kvms100/kvmeta ... forked bin/kvms... ... ] atlantis [ kvds103 pid 1056084 alive kvms100 pid 1056057 alive spread pid 1056053 alive kvds102 pid 1056075 alive kvds100 pid 1056062 alive kvds101 pid 1056066 alive ] unix$
Here, atlantis
started a few processes and, once done, the start
command checked out if the processes are indeed alive.
In case not all components can be started successfully, the whole LeanXcale system is halted by the start command.
By default not watcher or automatic restart is setup.
Using flag -r
asks start to start the system asking it to restart any QE that
was running, failed, and was not restarted less than one minute ago.
using flag -w
asks start to start lx watch
.
The watch tool will wait until the system becomes operational and, upon failures,
try to restart the whole system.
To start a single host or component, use its name as an argument, like in:
# start the given host unix$ lx start atlantis # start the named components unix$ lx start kvds # start the named components at the given host unix$ lx start atlantis kvds
Start does not wait for the system to be operational. To wait until the
system is ready to handle SQL commands, the status
command can be
used with the -w
(wait for status) flag, as in:
unix$ lx status -w running status: running
Without the -w
flag, the command prints the current status, which
can be stopped, failed, running, or waiting.
5.2. Docker System Start
To start LeanXcale installed on Docker containers, you must start the containers holding the installed system components.
For example, consider the default docker install
unix$ lxinst docker ... install done docker images: REPOSITORY TAG IMAGE ID CREATED SIZE uxbase 2 7c8262008dac 3 months ago 1.07GB lx 2 cafd60d35886 3 seconds ago 2.62GB docker network: NETWORK ID NAME DRIVER SCOPE a8628b163a21 lxnet bridge local to start: docker run -dit --name lx1 --network lxnet lx:2 lx1
The install process created a docker image named lx:2
, installed for the docker
host lx1
, and the docker network lxnet
.
To list the image we can
unix$ docker images lx REPOSITORY TAG IMAGE ID CREATED SIZE lx 2 75b8c9ffa245 About a minute ago 2.62GB
And, to list the networks
unix$ docker network ls NETWORK ID NAME DRIVER SCOPE a8628b163a21 lxnet bridge local
The created image is a single one for all containers. The name given when creating the container determines the host name used The install process specified the host names, and containers must be starte using the corresponding host name(s), so they know which leanXcale host they are for.
For example, to start the container for lx1
:
unix$ docker run -dit --name lx1 --network lxnet -p0.0.0.0:14420:14420 lx:2 lx1 b28d30702b80028f8280ed6c55297b2e203540387d3b4cfbd52bc78229593e27
In this command, the container name is lx1
, the network used
lxnext
, and the image used lx:2
. The port redirection -p…
exports the SQL port to the underlying host.
Listing docker processes shows now the running container
unix$ docker ps CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES e81d9d01f40a lx:2 "/bin/lxinit lx1" Up 56 seconds 14410 lx1
It is important to know that:
-
starting the container will start leanXcale if a valid license was installed;
-
stopping the container should be done after stopping leanxcale in it.
The container name (lx1
) can be used to issue commands. For example,
unix$ docker exec -it lx1 lx version leanXcale v2.1 unstable kv v2.1.2.14-02-15.c26f496706918e610831c02e99da3676a1cffa47 lxhibernate v2.1.2.14-02-07.f65c5a628afede27c15c77df6fbbccd6d781d3ee TM v2.1.2.14-02-06.bfc9f92216481dd05f51900ac522e5ccfb6d2555 QE v2.1.2.14-02-15.4a8ff4200dc3d3656c8469b6f74c05a296fbdfb3 avatica v2.1.2.14-02-14.1c442ac9e630957ace3fdb5c4faf92bb85510099 ...
executes lx version
on the lx1
container.
The status for the system can be seen in a similar way:
unix$ docker exec -it lx1 lx status status: running
Note that the container will not start the DB if no valid license is found.
5.3. AWS System Start
To start LeanXcale installed on AWS, you must start the AWS instances holding the installed system components.
This can be done by hand using the AWS console, or using the lxaws
command.
For example, after installing using xample
as an AWS tag, this command
starts the instances:
unix$ lxaws -start xample
Once instances are started, the lx
command is available at any of them.
For example, provided the PEM file can be found at xample.pem, we can run this:
unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx version
to see the installed version.
Here xample1
is the DNS host name registered as the first host for the install
AWS tag xample
. In the same way, xample2
would be the name for the second host, and
so on.
6. Bare Metal Starts at LeanXcale
When using replication, you can start one of the replicas (1 or 2):
unix$ lx start repl1
This addressing scheme can be used in many other commands. See the
reference for the lx
command later in this document for a full
description of addressing.
7. Checking System Status
7.1. Bare Metal System Status
The command lx status
reports the status for the system
or waits for a given status.
For example,
unix$ lx status status: waiting kvds100: recovering files kvds101: recovering files
Or, to wait until the status is running
:
unix$ lx status -v -w running status: waiting kvds100: recovering files kvds101: recovering files status: running unix$
To see the status for each one of the processes in the system, use lx procs
.
For example:
unix$ lx procs procs... atlantis [ kvds103 pid 1057699 alive running kvms100 pid 1057672 alive running spread pid 1057668 alive kvds102 pid 1057690 alive running kvds100 pid 1057677 alive running kvds101 pid 1057681 alive running ]
7.2. Docker System Status
Before looking at the LeanXcale system status, it is important to look at the status of the docker containers running LeanXcale components.
unix$ docker ps CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES e81d9d01f40a lx:2 "/bin/lxinit lx1" Up 56 seconds 14410 lx1
When containers are running,
the command lx status
reports the status for the system
or waits for a given status.
For example,
unix$ docker exec -it lx1 lx status
executes lx status
on the lx1
container.
The status is reported for the whole system, and not just for that container.
To wait until the status is running
:
unix$ docker exec -it lx1 lx status -v -w running status: waiting kvds100: recovering files kvds101: recovering files status: running
To see the status for each one of the processes in the system, use lx procs
.
For example:
unix$ docker exec -it lx1 lx procs procs... atlantis [ kvds103 pid 1057699 alive running kvms100 pid 1057672 alive running spread pid 1057668 alive kvds102 pid 1057690 alive running kvds100 pid 1057677 alive running kvds101 pid 1057681 alive running ]
7.3. AWS System Status
Before looking at the LeanXcale system status, it is important to look at the status of the AWS instances running LeanXcale components.
This can be done using the lxaws
-status
flag with the installed AWS tag name:
unix$ lxaws -status xample #xample.aws.leanxcale.com: inst i-02bbf1473c01ea6ae xample2.aws.leanxcale.com stopped inst i-05e0708c0e4965ef0 xample1.aws.leanxcale.com 54.84.39.77 running
When instances are running,
the command lx status
reports the status for the system
or waits for a given status.
For example,
unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx status
to see the system status.
To wait until the status is running
:
unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx status -v -w running status: waiting kvds100: recovering files kvds101: recovering files status: running
To see the status for each one of the processes in the system, use lx procs
.
For example:
unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx procs procs... atlantis [ kvds103 pid 1057699 alive running kvms100 pid 1057672 alive running spread pid 1057668 alive kvds102 pid 1057690 alive running kvds100 pid 1057677 alive running kvds101 pid 1057681 alive running ]
8. Stopping the System
8.1. Bare Metal System Stop
The stop
command halts LeanXcale:
unix$ lx stop stop... atlantis [ kvcon[1056801]: halt ] atlantis [ term 1056062 2056066 1056075 1056084 1056057 1056053... kill 1056062 2056066 1056075 1056084 1056057 1056053... ] unix$
8.2. Docker System Stop
Stopping the LeanXcale containers should be done after stopping leanxcale. The reason is that docker might timeout the stop operation if the system is too busy updating the disk during the stop procedure.
To stop the database,
unix$ docker exec -it lx1 lx stop
stops the components for the whole system (being lx1
an installed container).
We can double check this
unix$ docker exec -it lx1 lx status status: stopped
Once this is done, we can stop the docker container.
unix$ docker ps CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES e81d9d01f40a lx:2 "/bin/lxinit lx1" Up 56 seconds 14410 lx1
unix$ docker stop lx1 lx1
unix$ docker ps unix$
We can also remove the container, but, note that doing this removes all data in the container as well.
unix$ docker rm lx1 lx1 unix$
8.3. AWS System Stop
To stop leanXcale on AWS, you must stop leanXcale before stopping the AWS instances running it. For example
unix$ ssh -i xample.pem xample1.aws.leanxcale.com lx stop
This stops the system on all the instances it uses.
Then, the instances can be stopped.
This can be done on the AWS console, or using
the lxaws
-stop
flag with the installed AWS tag name:
unix$ lxaws -stop xample #xample.aws.leanxcale.com: inst i-02bbf1473c01ea6ae xample2.aws.leanxcale.com stopping inst i-05e0708c0e4965ef0 xample1.aws.leanxcale.com stopping
9. System Recovery
The lxmeta
component watches the status for other components and will stop
the system when there is a failure that cannot be recovered online.
Should the system crash or fail-stop, upon a system restart, lxmeta
will
guide the system recovery.
At start time, each system component checks out its on-disk information and decides to start either as a ready component or as a component needing recovery.
The lxmeta
process, guides the whole system start process following these steps:
-
Wait for all required components to be executing.
-
Look up the component status (ready/recovering).
-
If there are components that need recovery, their recovery process is executed.
-
After all components are ready, the system is made available by accepting queries.
The command lx status
can be used both to inspect the system status and the
recovering process, or to wait until the recovery process finishes and the system
becomes available for queries.
10. Distributed and Replicated Installs
In general, the system is used the same way it is used when installing using a single host. Refer to the section for the kind of install of interest to learn how to start, stop, and operate the system before reading this section.
As described in the reference manual, most commands take arguments to select particular replicas, hosts, or components. And this is the case for start and stop commands. On replicated installs it is important to start and stop the whole system.
Starting the whole system checks out that replicas are synchronized and takes care of updating outdated metadata of a previously failing or stopped replica.
If a replica is not reachable and start
cannot ensure that the system would start
with the most recent metadata, the system will not start.
On distributed and replicated installs it is possible to ask start
to proceed
with just a single replica or a single host or set of components.
This is done by calling start
with arguments that name just a replica (or perhaps
a host or a set of components).
On replicated installs, two useful names are repl1
and repl2
, to
ask a command to operate on the first or the second replica.
By convention, the first replica is the set of hosts configured that are not mirrors, and the second replica is the set of hosts that are mirrors of former hosts.
As an example, we use this configuration file
# lxinst.conf host blade110 kvds host blade161 mirror blade110
In this case, the first replica is just blade110
and the second replica
is just blade161
.
Installing the system on bare metal is done using
unix$ lxinst -f lxinst.conf
To start the system we execute
unix$ lx start start... blade110 [ bin/spread -c lib/spread.conf ... forked bin/spread... ... ] blade161 [ bin/spread -c lib/spread.conf ... forked bin/spread... ... ] unix$
We can ask for the system status or wait for a status as usual:
unix$ lx status status: running
To stop the system:
unix$ lx stop stop... blade110: [ stop: term lxqe100.r1 pid 460953 stop: term lxmeta100.r1 pid 460932 stop: term kvds100.r1 pid 460927 stop: term kvms100.r1 pid 460923 stop: term spread pid 460919 ] blade161: [ stop: term lxqe100.r2 pid 250984 stop: term lxmeta100.r2 pid 250959 stop: term kvds100.r2 pid 250955 stop: term kvms100.r2 pid 250950 stop: term spread pid 250946 ]
10.1. Partial Starts and Stops
When using multiple hosts and replication, it is possible to start and stop individual hosts or replicas and force the system to run using just those.
For example,
unix$ lx stop repl1 stop... blade110: [ stop: term lxqe100.r1 pid 443056 stop: term lxmeta100.r1 pid 443035 stop: term kvds100.r1 pid 443030 stop: term kvms100.r1 pid 443026 stop: term spread pid 443022 ]
stops the processes in replica-1.
To start it again, we can proceed in a similar way:
unix$ lx start repl1 blade110 [ bin/spread -c lib/spread.conf ... forked bin/spread... bin/spread: started pid 446756 ... ] blade110 [ kvds100.r1 pid 446764 alive disk open spread pid 446756 alive lxmeta100.r1 pid 446769 alive starting lxqe100.r1 pid 446790 alive kvms100.r1 pid 446760 alive ]
Stopping a replica while the system is running is strongly discouraged. Using it again requires restoring the replica state to make it work with the rest of the system.
In this example, if the whole system was running when lx stop repl1
was used,
starting repl1
again will reintegrate it into the system if possible.
However, if we have a fully stopped system, and run
unix$ lx start repl1
the system will run just the first replica. This will happen even if the second replica is not reachable and there is no way to ensure that the metadata in the first replica is up-to-date.
To ensure that the metadata is up-to-date in partial starts, use flag -c
.
This performs the same checks made when starting the whole system, and ensures that
metadata is up to date, before attempting an start of the named replica or components.
10.2. System Status and Replication
The command lx status
reports the status for the system
or waits for a given status, as dsecribed for other installs in this
document.
For example,
unix$ lx status status: running
To see the replication mirror status for the system, use
flag -r
unix$ lx status -r status: running replica: ok
And, to see detailed information about components, use flag -p
, perhaps
in addition to -r
:
unix$ lx status -rp status: running replica: ok kvds100.r1 mirror alive snap 1173999 running kvds100.r2 mirror alive snap 1173999 running kvms100.r1 mirror alive kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 1173999 running lxmeta100.r2 mirror alive running lxqe100.r1 mirror alive snap 929999 running lxqe100.r2 mirror alive snap 916999 running
In this example, all components are running with their mirror set as ok.
When some components failed, or part of the system was stopped, we can see a different output.
For example, after
unix$ lx stop repl1
We can see
unix$ lx status -rp status: running replica: single kvds100.r2 single alive snap 228678999 single running kvds100.r1 outdated stopped snap 228161999 kvms100.r1 mirror stopped kvms100.r2 mirror alive lxmeta100.r2 mirror alive snap 362692999 running lxmeta100.r1 mirror stopped snap 108718999 lxqe100.r2 mirror alive snap 362664999 running lxqe100.r1 mirror stopped snap 227825999
The first thing to note here is that the replica is not ok
, but single
.
This means we have single processes (without their mirrors) and the system
is running in degraded mode.
Also, kvds100.r2
status with respect to replication is single
.
This means that it was running while its peer (kvds100.r1
, in the first
replica) was stopped.
This server will not be used again until it has been brought up to date with respect to the rest of the system.
Note how kvds100.r1
status with respect to replication is outdated
.
This means it passed away (failed or halted) while its peer was still in use.
The same happens to lxqe100.r2
, but in this case both query engines
could synchronize their mirrors after restarting lxqe100.r1
and nothing else
was needed to permit the restarted process work with the rest of the system.
10.3. System Metadata and Replication
To inspect the status for a replicated system it is useful to look at
DB metadata as stored on disk.
On replicated systems the whole system is using a master metadata
server (kvms
), which synchronizes metadata with its mirror server.
Looking at the disk information may aid in diagnosing the state for the system when some replica is not running, or has been retired from service.
The dbmeta
command can be used to do this.
For example, after running
unix$ lx start repl1
on a replicated system previously halted, we can see
lx status -rp status: running replica: ok kvds100.r1 mirror alive snap 163191999 running kvds100.r2 mirror stopped snap 110005999 kvms100.r1 mirror alive kvms100.r2 mirror stopped lxmeta100.r2 mirror stopped snap 497103000 lxmeta100.r1 mirror alive snap 163670999 running lxqe100.r1 mirror alive snap 163422999 running lxqe100.r2 mirror stopped snap 109917999
The system has not been used, so the mirror status is still ok.
The output for dbmeta
returns what is known by the running kvms
server:
unix$ lx dbmeta /srv dbmeta... # kvms100.r1 kvms blade110!14400 kvms blade161!14400 kvds ds100.r1 at blade110!14500 snap 172793999 rts 112774999 lxmeta mm100.r2 at blade161!14410 snap 497103000 kvds ds100.r2 at blade161!14500 snap 110005999 rts 109069999 lxmeta mm100.r1 at blade110!14410 snap 172793999 lxqe qe100.r1 at blade110!14420 snap 173016999 lxqe qe100.r2 at blade161!14420 snap 109917999
We asked just for metadata of servers using the /srv
resource path name.
The interesting part is that we can ask for metadata as known by both replicas:
unix$ lx dbmeta -a /srv dbmeta... blade110: # kvms100.r1 kvms blade110!14400 kvms blade161!14400 kvds ds100.r1 at blade110!14500 snap 174236999 rts 112774999 lxmeta mm100.r2 at blade161!14410 snap 497103000 kvds ds100.r2 at blade161!14500 snap 110005999 rts 109069999 lxmeta mm100.r1 at blade110!14410 snap 174236999 lxqe qe100.r1 at blade110!14420 snap 174457999 lxqe qe100.r2 at blade161!14420 snap 109917999 blade161: # kvms100.r2 kvms blade161!14400 kvms blade110!14400 kvds ds100.r1 at blade110!14500 snap 110005999 rts 109069999 kvds ds100.r2 at blade161!14500 snap 110005999 rts 109069999 lxmeta mm100.r2 at blade161!14410 snap 497103000 lxmeta mm100.r1 at blade110!14410 snap 110005999 lxqe qe100.r1 at blade110!14420 snap 109920999 lxqe qe100.r2 at blade161!14420 snap 109917999
It can be seen how replica-2 (that for kvms100.r2
) is way out of date
at least with respect to snapshots.
This is not a surprise because it is stopped.
We can ask for the full metadata using
unix$ lx dbmeta -a
or for that for a particular table or index.
10.4. Failures
When there is a failure, the system continues to operate using the mirror processes that remain alive.
Here we describe example failures, and provide details about reparing
specific failed components. Then we describe how to use lx restore
to
try to restore things in a more convenient way.
For example, if qe100.r2
fails (we killed it to make it so), this can
be seen:
unix$ lx status status: running with failures
Further details are reported by flags -r
(replication) and -p
(process):
unix$ lx status -rp status: running with failures replica: ok kvds100.r1 mirror alive snap 232084999 running kvds100.r2 mirror alive snap 232084999 running kvms100.r1 mirror alive kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 232564999 running lxmeta100.r2 mirror alive running lxqe100.r1 mirror alive snap 232416999 running lxqe100.r2 mirror dead snap 228005999
Component lxqe100.r1
is alive and running, and lxqe100.r2
is dead.
Using now the database produces a change in status:
lxqe100.r1 single alive snap 267003999 single running lxqe100.r2 outdated dead snap 228005999
This means that lxqe100.r1
is known to be single, i.e., it has been used
while its mirror was dead or halted.
Also, lxqe100.r2
is known to be outdated
, i.e., its mirror has been used
while it was dead or halted.
10.5. Restoring LXMETA Failures
Recovering from lxmeta
failures is trivial because the component
simply rebuilds its state from the running system.
10.6. Restoring KVMS Failures
Recovering from kvms
failures requires making sure that when the system
starts, the new master has the most recent metadata on disk.
kvms
servers kept the metadata synchronized and there is
nothing special to be done to recover them from a failure as long
as they can reach the current master server or the disk data for the new master
is up to date.
For example, if the master kvms
dies, we see this as the status:
unix$ lx status -rp status: running with failures replica: ok kvds100.r1 mirror alive snap 100079999 running kvds100.r2 mirror alive snap 99122999 running kvms100.r1 mirror dead kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 100079999 running lxmeta100.r2 mirror alive running lxqe100.r1 mirror alive snap 99912999 running lxqe100.r2 mirror alive snap 99449999 running
The system continues to operate using kvms100.r2
.
To recover at this point, it suffices to restart the failed kvms
:
unix$ lx start kvms100.r1
It takes its state from kvms100.r2
, which is the current master, and the
system is ok.
However, if the system stops before updating kvms100.r1
disk with the,
possibly newer metadata, it can be that the next restart it will be
the new master and also have old data, leading to problems.
When used to start the whole system,
lx start
takes care of updating the metadata on disk for
kvms
components with that from the previous master (the one with newest
timestamps in it).
When starting components by hand or by individual hosts, the kvms
data
should be updated on disk for the replicas with the data from the newest one.
That is, unless the server keeping the previous kvms
master is unreachable,
running
unix$ lx start
suffices to use up to date metadata despite previous kvms
failures.
But, running explicitly kvms100.r1
after its failure, without updating
its disk metadata, with the rest of the system stopped or unreachable, will
lead to an execution with old system metadata, leading to problems.
This is only important when there are failures or explicit stops for parts of the system. Otherwise, all metadata is synchronized and there should be no problem.
10.7. Restoring QE Replica Failures
Doing a system stop, and a system start, should recover failed query engines without doing anything else. But this is not always possible or desirable.
In the case of query engine failures, we can recover them while the system is running by starting the failed query engine.
unix$ lx start lxqe100.r2
Asking for the status for replica and processes yields:
unix$ lx status -rp status: running replica: ok kvds100.r1 mirror alive snap 309358999 running kvds100.r2 mirror alive snap 309358999 running kvms100.r1 mirror alive kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 309358999 running lxmeta100.r2 mirror alive running lxqe100.r1 mirror alive snap 309900999 running lxqe100.r2 mirror alive snap 309253999 running
If the system has been running for too long while a query engine was stopped, it may be better to recover the failed query engine disk state from the mirror alive.
This command recreates the disk for the replica of lxqe100.r2
from the disk
of lxqe100.r1
.
The argument order is similar to cp
, i.e., source and then destination.
unix$ lx copy lxqe100.r1 repl copy blade110 lxqe100.r1 blade161 lxqe100.r2
Here, it is important to use repl
as the (source or) target argument name,
to let copy
know that it is updating a replica.
Use lx copy
with care.
It blindly copies the disk from one component to another, and you might overwrite
data in the process.
Flag -n
makes the command report what it would do,
and it is sensible to use it
before actually copying anything.
More details are printed using flag -v
:
unix$ lx copy -nv lxqe100.r1 repl #disk... copy blade110 lxqe100.r1 blade161 lxqe100.r2 copy local disk/lxqe100.r1/log blade161 disk/lxqe100.r2/logm copy local disk/lxqe100.r1/logm blade161 disk/lxqe100.r2/log
Here, log
is the log data and logm
is the mirror kept for the mirror QE,
they are exchanged after the copy to make lxqe100.r2
a replica for lxqe100.r1
.
10.8. Restoring KVDS Replica Failures
In the case of a kvds
failure, bringing it back into operation requires
reparing it by updating its data with that from its mirror.
For example, using our example install and killing kvds100.r2
leads to
this status:
unix$ lx status -rp status: running with failures replica: single kvds100.r1 single alive snap 5735999 single running kvds100.r2 outdated dead snap 3839999 kvms100.r1 mirror alive kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 6688999 running lxmeta100.r2 mirror alive running lxqe100.r1 mirror alive snap 6602999 running lxqe100.r2 mirror alive snap 6599999 running
Here, kvds100.r1
continued to run (has a single
status) and
kvds100.r2
failed while its mirror was running (has an outdated
status).
The whole system replication status is now single
, because of this.
Starting kvds100.r2
by hand does not change things.
Its state is probably out of date with respect to its mirror, and it is not
integrated into the running system until restored.
unix$ lx start kvds100.r2 ... unix$ lx status -rp status: running replica: single kvds100.r1 single alive snap 12421999 single running kvds100.r2 outdated alive snap 13379999 disk open kvms100.r1 mirror alive kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 13379999 running lxmeta100.r2 mirror alive running lxqe100.r1 mirror alive snap 13282999 running lxqe100.r2 mirror alive snap 13281999 running
Stopping the system still preserves the status for the single and failed server:
unix$ lx stop ... unix$ lx dbmeta -a /srv dbmeta... blade110: # kvms100.r1 kvms blade110!14400 kvms blade161!14400 lxmeta mm100.r2 at blade161!14410 kvds ds100.r1 at blade110!14500 single snap 16247999 kvds ds100.r2 at blade161!14500 fail snap 16247999 lxmeta mm100.r1 at blade110!14410 snap 16247999 lxqe qe100.r1 at blade110!14420 snap 16147999 lxqe qe100.r2 at blade161!14420 snap 16146999 blade161: # kvms100.r2 kvms blade161!14400 kvms blade110!14400 lxmeta mm100.r2 at blade161!14410 kvds ds100.r1 at blade110!14500 single snap 16247999 kvds ds100.r2 at blade161!14500 fail snap 16247999 lxmeta mm100.r1 at blade110!14410 snap 16247999 lxqe qe100.r1 at blade110!14420 snap 16147999 lxqe qe100.r2 at blade161!14420 snap 16146999
Restarting the system at this point will force a recovery from the DB log data,
because there was a single kvds
and its mirror must be updated before
it can run user operations.
unix$ lx start ... unix$ lx status -rp status: running replica: ok kvds100.r1 mirror alive snap 18023999 running kvds100.r2 mirror alive snap 18023999 running kvms100.r1 mirror alive kvms100.r2 mirror alive lxmeta100.r1 mirror alive snap 18023999 running lxmeta100.r2 mirror alive running lxqe100.r2 mirror alive snap 17930999 running lxqe100.r1 mirror alive snap 17919999 running
When the kvds
server has been down for too long, it is better to restore its
disk from its surviving mirror before restarting the system.
Consider the system after kvds100.r2
failed, and it was stopped using
lx stop
.
Instead of using lx start
directly, we can use lx copy
to restore
the obsolete and failed kvds100.r2
:
unix$ lx copy kvds100.r1 repl copy blade110 kvds100.r1 blade161 kvds100.r2
Copying a kvds
disk can take a long time.
10.9. Using the recover command
The command lx recover
inspects metadata and tries to restore the disk for
failed components.
It can be used before starting the system, to let lx start
start an already
restored replicated system.
To make an example, with the example system running, we killed both
kvds100.r1
and lxqe100.r2
and stopped the system after that.
This is the resulting server metadata:
unix$ lx dbmeta /srv dbmeta... # kvms100.r1 meta ts 15264000 kvms blade110!14400 kvms blade161!14400 lxmeta mm100.r2 at blade161!14410 kvds ds100.r1 at blade110!14500 fail snap 2101999 kvds ds100.r2 at blade161!14500 single snap 14804999 lxmeta mm100.r1 at blade110!14410 snap 15264000 lxqe qe100.r1 at blade110!14420 single snap 14736999 lxqe qe100.r2 at blade161!14420 snap 925999
A dry run (flag -n
) for lx recover
shows this:
unix$ lx recover -n #disk... copy blade110 kvds100.r2 blade161 kvds100.r1 clear kvds100.r2 flag single clear kvds100.r1 flag fail copy blade110 lxqe100.r1 blade161 lxqe100.r2 clear lxqe100.r1 flag single clear lxqe100.r2 flag fail
We can use lx copy
and lx dbmeta -w
to copy disks and clear flags,
but it is more convenient to run this command without the dry run flag
once we are decided to do so.
The recover
command can take arguments
to select what should be recovered, following
the style of most other commands. For example:
lx recover kvds100 #recover... copy blade110 kvds100.r2 blade161 kvds100.r1 clear kvds100.r2 flag single clear kvds100.r1 flag fail
After a recover, the system can be started normally.
11. Running SQL Queries
To use the database with the standard SQL client, you can use this command supplying the user name and secret:
unix$ lx sql -n lxadmin -p **** lx% !tables ...
It is suggested not to give the password in the command line, but to
open the connection later within the sql
prompt, to prevent listing
the password in the process list for the underlying password. This was
just an example.
The isolation mode can be set using the t
argument, as in:
unix$ lx sql -Msession -n lxadmin -p **** -t rawrc lx% !tables ...
LeanXcale isolation levels are (from strongest to weakest):
-
snapshot_isolation or si: standard snapshot isolation, that is, when the transaction starts gets the current snapshot and all reads are performed over that snapshot.
-
read_committed or rc: each SQL statement gets the current snapshot and reads are performed from that snapshot.
-
raw_read_committed or rawrc: each read performed by an SQL statement get the latest committed value.
-
loader: Special isolation level to accelerate database loads. It is like read committed but it does not perform conflict detection nor logging. This can only be done without any other transactions accessing the database.
Communication between the SQL client and the query engine(s) is not encrypted by
default.
Install using the tls
property to ask for encrypted communications.
The lx sql
command does this on its own, but,
when using a connection URL on a standard JDBC client, add the
tls=yes
property to the connection property set. This tells the leanXcale client driver to use TLS.
It is possible to supply the isolation mode on the URL using
mode=read_committed
as a property in the URL.
But note that when this is used in lx sql
, the console will still overwrite
the URL mode with either its default mode or the one set using the -t
flag.
It is possible to supply extra addresses when multiple servers are available, for
load-balancing and for HA.
The lx sql
command does this on its own, but, you may supply extra addresses in the
xaddr
property separated by commas.
For example
xaddr=host2:14420,host3:14420
adds two extra server addresses besides the one given in the URL.
11.1. Using DBeaver
To use dbeaver
with leanXcale, just use the standard JDBC driver and configure
the URI to access the installed system.
For example, download the JDBC for this version from Mvn central leanXcale drivers. Usually, you want the last driver. As of now, it is the leanXcaleJDBC driver 3.2. Then follow these steps:
-
Add the driver at Data Base, Driver manager
-
At settings, use
com.leanxcale.client.Driver
as the driver -
At settings, use
jdbc:leanxcale://{host}:{port}/{database}
as the URL template. -
At settings, use 14420 as the default port
-
At settings, use
db
as the default database. -
At the libraries tab, use add file and use the path to the downloaded JDBC driver.





At this point you can add the connection using the DB type just configured and the user and password as needed.
12. Configuring the System
The lx config
command prints or updates the configuration used for the
LeanXcale system:
unix$ lx config cfgile: /usr/local/leanxcale/lib/lxinst.conf... #cfgfile /usr/local/leanxcale/lib/lxinst.conf host localhost lxdir /usr/local/leanxcale JAVA_HOME /usr/lib/jvm/java-1.11.0-openjdk-amd64 addr 127.0.0.1 odata 100 addr localhost!14004 kvms 100 addr 127.0.0.1!14400 lxmeta 100 addr 127.0.0.1!16500 lxqe 100 addr 127.0.0.1!16000 kvds 100 addr 127.0.0.1!15000 kvds 101 addr 127.0.0.1!15002 kvds 102 addr 127.0.0.1!15004 kvds 103 addr 127.0.0.1!15006
The configuration printed provides more details than the configuration file (or command line arguments) used to install.
NB: when installing into AWS or docker, the configuration printed
might lack the initial aws
or docker
property used to install it.
It is possible to ask for particular config entries, like done here:
unix$ lx config kvms addr
It is also possible to adjust configured values, like in
unix$ lx config -s lxqe mem=500m
used to adjust the lxqe
mem
property to be 500m
in all lxqe
components configured.
13. System Logs
Logs are kept on a per-system directory, named log
, kept at the install
directory for each system.
Log files have names similar to
kvms100.240214.1127.log
Here, the component name comes first, and then, the date and time when the log file was created. When a log file becomes too big, a new one is started for the given component.
It is convenient to use the lx logs
command to list and inspect logs.
It takes care of reaching the involved log files on the involved hosts.
For example, to list all log files:
unix$ lx logs logs... atlantis: [ log/kvds100.240214.1127.log 250.00K log/kvms100.240214.1127.log 35.39K log/lxmeta100.240214.1127.log 24.10K log/lxqe100.240214.1127.log 445.80K log/spread.240214.1127.log 963 log/start.log 426 ]
Here, atlantis
was the only system installed.
We can give host and/or component names as in many other commands to focus on those systems and/or components.
For example, to just just logs for kvms
processes:
unix$ lx logs kvms logs... atlantis: [ log/kvms100.240214.1127.log 35.39K ]
Or, to list only those for the kvms100
:
unix$ lx logs kvms100
To list logs for the atlantis
host:
unix$ lx logs atlantis
To list logs for kvms
components within atlantis
:
unix$ lx logs atlantis kvms
To list logs for kvms
components at atlantis
and kvds
at orion
:
unix$ lx logs atlantis kvms orion kvds
With flag -p
, logs are printed in the output instead of being listed.
unix$ lx logs -p lxmeta atlantis: [ log/lxmeta100.240214.1127.log 24.10K [ # pid 3351755 cmd bin/javaw com.leanxcale.lxmeta.LXMeta -a atlantis!14410 ...
Flag -g
greps the logs for lines with the given expression.
For example:
unix$ lx logs -g fatal
And, flag -c
copies the logs to the given directory
unix$ lx logs -c /tmp
When printing and copying the logs, only the last log file for each component
are used.
To operate on all the logs and not just on the last one, use flag -a
too:
unix$ lx logs -a -c /tmp
14. Logs and Diagnostics
In this section we describe some information that can be found in the system log files.
14.1. Authentication Diagnostics
The lxqe
log files include authentication information that can be checked out when
in doubt.
Lines like
2024-02-16 08:45:08,752 INFO lxqe: authenticate: lxadmin: local: yes
report that the user lxadmin
(or whoever it was) was authenticated by the local
(i.e., the DB) user entry.
When using LDAP, the line would be
2024-02-16 08:45:08,752 INFO lxqe: authenticate: db-USR1: ldap: yes
This reports both the database used (db
) and the user involved (USR1
).
Authentication errors are reported in a similar way:
2024-02-16 08:48:09,992 INFO lxqe: authenticate: db-USR2: ldap: no: [LDAP: error code 49 - Invalid Credentials]
We folded the line to make it easier to read.
14.2. Login and Auditing
The audit
log file contains auditing entries, including user authentication.
These can be enabled using the AUDIT
SQL statement, although authentication auditing
is always enabled and cannot be disabled.
For example, to locate authentication entries for the user lxadmin
the flag -g
(for grep) can be used as shown here
unix$ lx logs -g auth.*lxadmin audit logs... atlantis: [ log/audit.240221.0708.log 1.75K [ 1: 2024-02-21 07:08:10 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes 2: 2024-02-21 07:08:12 lxqe100: audit: auth: db-USR2: 127.0.0.1: local: yes 3: 2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes 4: 2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes 5: 2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes ... ] ]
The user names include both the database and the user name, as a convenience.
An exception to this rule is lxadmin
.
Authentication failures are reported with lines like
2024-02-29 06:40:52 lxqe100: audit: auth: db-FOO: 127.0.0.1:39018: local: no:auth failed
If user or table auditing are enabled on a per-session basis, read, write, and change (alter) accesses are reported once per session. For example, after executing any of these:
AUDIT TABLE MYTBL AUDIT USER USR1 AUDIT USER USR1 TABLE MYTBL
the audit log file will include reports like:
2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: /db/USR1/tbl/MYTBL
More expensive auditing reports the statements executed (even if they fail), and can be enabled using, for example
AUDIT USER USR1 BY ANY STATEMENT
Although it is more sensible to audit just READ
, WRITE
, or DDL
statements instead
of auditing all of them.
In this case, the audit log will contain lines like the following, and it may grow quickly.
2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: SELECT count(*) FROM MYTBL
When permission is denied to execute a statement, an audit record is added if any audit has been enabled.
2024-02-29 06:40:51 lxqe100: audit: perm: db-TEST1: /db/APP/tbl/PERSONS: select on table: permission denied
The line has been folded for readability, although it is a single line in the audit log.
It is important to know that
the lxqe
log file reports failed permission checks even when
auditing is disabled.
There is no need to enable auditing just to search for failed permission checks.
15. Backups
Backups can be made to a external location (recommended to tolerate
disk failures) or within an installed host. External backups (i.e.,
to an external location) are made using the lxbackup
tool, installed
at the bin
directory, which works with a given configuration file.
Internal backups (i.e., to a directory on installed hosts) are made
using the lx backup
tool instead.
usage: lxbackup [-h] [-v] [-D] [-n] [-f cfgfile] [-d dir] [-i] [-e] [-r] [-o] [-F] [what [what ...]] make a backup positional arguments: what host/comps optional arguments: -h, --help show this help message and exit -v verbose -D enable debug diags -n dry run -f cfgfile config for the install -d dir root backup dir -i incremental backups -e encrypt -r restore -o online backup -F force command
Using lxbackup
is exactly like using lx backup
with a few
differences:
-
Flag
-f
is mandatory on the first invocation and provides the installed configuration file. -
The default backup directory is not
$LXDIR/dump
, but./lxdump
. -
The command
kvtar
must be available at the host runninglxbackup
.
The external host must have ssh access to the installed hosts.
The configuration file used must be the one retrieved using lx config
, and
not the one written by the user to perform the install, because lx config
reports addresses and details needed by the command.
Once a backup has been created, the configuration used is saved along with the
backup data and there is no need to use -f
to supply it.
Encryption/decryption happens at the source/destination of data when
creating/extracting backups.
Therefore, there is no key kept at the host running lxbackup
.
To prepare a host to use lxbackup
, get the lxbackup
command, the configuration,
and the kvtar
command and copy them to the host.
You might want to copy also lxrestore
and lxbackups
.
These commands can be found at the installed $LXDIR/bin
directory on any installed host. If you are not sure regarding the
$LXDIR
value, use this command to find it:
unix$ lx -d pwd /usr/local/leanxcale unix$
Flag -d
for lx
makes it change to the $LXDIR
directory
before running the given command.
The detailed configuration file to be used is kept at $LXDIR/lib/lxinst.conf
.
The configuration can be retrieved using the
lx config
command.
For example, this creates ./lxinst.conf
with the detailed configuration:
unix$ lx config -o lxinst.conf saved lxinst.conf
As an example, we can setup an external host named
orion
to perform backups in this way:
unix$ lx -d pwd /usr/local/leanxcale unix$ lx config -o /tmp/lxinst.conf saved /tmp/lxinst.conf unix$ cd /usr/local/leanxcale/bin unix$ scp lxbackup lxbackups lxrestore kvtar /tmp/lxinst.conf orion:~
And then just:
orion$ lxbackup -f lxinst.conf
To create a full, cold, backup when leanXcale is not running:
orion$ lxbackup -f lxinst.conf #disk... new 240809 make atlantis disk/kvms100 /usr/local/leanxcale/dump/240809/kvms100 make atlantis disk/kvds100 /usr/local/leanxcale/dump/240809/kvds100 make atlantis disk/kvds101 /usr/local/leanxcale/dump/240809/kvds101 make atlantis disk/lxqe100/log /usr/local/leanxcale/dump/240809/lxqe100 unix$
Flag -f
must be used the first time at least. Once a backup has been made,
the configuration is saved and there is no need to supply it again.
The printed name is the name for the directory keeping the backup, as
used when restoring it. In this case, 240809
.
The files kept at the backup are compressed, and must uncompressed if copied by hand. The restore command takes care of this.
Using flag -e
both encrypts and compresses the backup files.
Flags belong to disk
, note that backup
is the argument given to disk
.
The key used to encrypt the backed files is kept at $LXDIR/lib/lxkey.pem
on the
installed sites as set up by the installer.
orion$ lxbackup -e #disk... new 24080901 make atlantis disk/kvms100 /usr/local/leanxcale/dump/24080901/kvms100 crypt make atlantis disk/kvds100 /usr/local/leanxcale/dump/24080901/kvds100 crypt make atlantis disk/kvds101 /usr/local/leanxcale/dump/24080901/kvds101 crypt make atlantis disk/lxqe100/log /usr/local/leanxcale/dump/24080901/lxqe100 crypt
To perform a backup while the system is running, supply flag -o
(for online).
15.1. Incremental Backups
To create an incremental backup use the flag -i
of lxbackup
(or the same flag with lx backup
for an internal backup).
orion$ lxbackup -i #disk... new 24080902+ incr: lxqe100 += lxqe100
The output reports which components got files backed up.
The incremental backup is encrypted if the total backup it refers to is
encrypted too. No flag -e
should be given.
The incremental backup can be performed while the system is running.
15.2. Listing Backups
To list the backups known use lxbackups
(or lx backups
for internal backups).
orion$ lxbackups 240810 ts 1419000 24081002+ ts 1419000 ...
Those with a +
in their names are incremental backups.
Verbosity can be increased adding one or more -v
flags:
orion$ lxbackups -v #disk... 240810 ts 1419000 240810/kvms100 ts 2551000 240810/kvds100 ts 1292000 ...
orion$ lxbackups -vv #disk... 240810 ts 1419000 240810/kvms100 ts 2551000 240810/kvms100/kvmeta sz 239 ts 2551000 mt 1723285636 240810/kvds100 ts 1292000 240810/kvds100/dbf00001.kv sz 393216 ts 1419000 mt 1723285636 ...
orion$ lxbackups -vvv #disk... 240810 ts 1419000 240810/kvms100 ts 2551000 240810/kvms100/kvmeta sz 239 mt 1723285636 localhost ts 1 2551000 bck 0 0 na 0 240810/kvds100 ts 1292000 240810/kvds100/dbf00001.kv sz 393216 mt 1723285636 localhost ts 1292000 1419000 bck 0 0 na 3 tname db-APP-PERSONS rmin: tpmin rmax: tpmax ...
It is possible to list a single backup by suppling its name and/or
specific components using arguments as done with most lx
commands:
orion$ lxbackups 24080901
or
orion$ lxbackups 24080901 lxqe
15.3. Removing Old Backups
To remove a backup, it suffices to remove its directory. For example:
unix$ lx -d rm -rf dump/230720
Here we used the flag “-d” for lx
to change to $LXDIR
before
executing the remove command, which makes it easy to name the directory
used for the dump.
Or, from our external backup example host:
orion$ rm -rf lxdump/230720.2
Beware that if you remove a backup, you should remove also those incremental backups that follow, up to the next total backup.
15.4. Restore
To restore a backup, use the lxrestore
command (or lx restore
for internal backups).
orion$ lxrestore -v #disk... check... restore 24080904... restore atlantis disk/kvds100 from /usr/local/leanxcale/dump/24080904/kvds100 crypt dbf00001.kv dbf00003.kv ...
Do this while the system is stopped. By default, it selects the last backup made.
To restore a particular backup, supply its name:
orion$ lxrestore 24080904
This can be done also for incremental backups. When restoring an incremental backup, the restore takes data also from previous incremental backups and from the previous total backup.
To restore only specific hosts or components, supply their names or types as done for other commands:
orion$ lxrestore lxqe
Or perhaps:
orion$ lxrestore 24080904 lxqe
To restore a backup, it is usually desirable to format the disk for the
involved components before using restore
to restore their disks.
16. Asynchronous Replication from Another LeanXcale Instance
It is possible to configure an install to make it pull changes from a remote one. In this case, the pulling system will fetch transactions made from the source system and apply them as well.
Applied transactions are not subject to conflict checks and the like, because the aim is to update the target system with respect to the source one, and the source one did already perform those transactions.
The target keeps on trying to reach the source system and, when connected, pulls changes and applies them. Should there be any error during an apply, it is considered a fatal error and the query engine where that happen will stop and signal the error.
If there are more query engines, the system might still continue running, depending on how it has been configured. But it is suggested to use a single query engine on a system pulling changes.
As an example, this configuration file pulls changes from a system installed
at hosts orion
and rigel
.
#cfgfile host atlantis lxqe LXPULL orion!14420;rigel!14420
To learn the addresses for the query engines of a particular install, use lx config
:
orion$ lx config
It is important to add all the addresses for query engines, or some transactions in the source system might be missed.
Once running, the lx status
command can show that the system is pulling.
Use the flag -p
to see the status for each process.
atlantis$ lx status -p status: running kvds100 alone alive snap 66709999 running kvms100 alone alive lxmeta100 alone alive snap 66709999 running lxqe100 alone alive snap 66736999 pulling from orion!14420 rigel!14420
Here, we can see that lxqe100
is pulling from a couple of remote query engines.
To stop pulling for a while, you can use a control request for the pulling query engine. For example:
atlantis$ lx kvcon ctl qe100 pull stop
The status line for lxqe100
should now say not pulling
.
To start pulling again, there is a similar control request:
atlantis$ lx kvcon ctl qe100 pull start
To change the addresses, although another control request (not shown here) can be used, it is usually better to stop the pulling system, change the configuration, and restart it.
This can be done using lx config
with the -s
flag to set a property in the configuration.
For example:
atlantis$ lx config -s lxqe100 'LXPULL=blade161!14424'
changes the configuration as it can be seen:
atlantis$ lx config #cfgfile lib/lxconfig.conf size small host atlantis kvds 100 addr atlantis!14500 mem 1024m kvms 100 addr atlantis!14400 lxmeta 100 addr atlantis!14410 lxqe 100 addr atlantis!14420 mem 1024m LXPULL blade161!14424
17. Forwarding System Updates
It is possible to execute a user program to process or forward every update made to the data on the installed system.
The program processing changes must be installed in the user bin directory
with the name fwd
, or in as a JAR file in the user lib directory with the name fwd.jar
.
The program runs with the different library, python, and class paths set to include the
user lib directory, where extra libraries can be installed as well.
Its current directory is the install directory at the running location.
Temporary files should be created on ./tmp
Any diagnostic should be printed on the standard error stream as every other command does.
The program output and error is saved on log files like done for other system components,
and it can be inspected with the lx log
command.
The program takes the option -f
(follow) to keep on following changes and the directory
name where look for files named changes*
reporting updates.
It should remove the processed files as soon as they have been processed.
For java, the class must be exactly com.leanxcale.usr.Changes
and supply a main method.
Changes files might be removed if not processed after many other changes have been reported, to avoid disk usage problems.
To forward changes, set the global fwd
attribute to yes
in the configuration, and
the forward program to the installed system as shown later.
When the system starts, the forward program will be started for each of the installed data servers, to forward or process its changes, and such data servers will report their changes.
For example, after with a xample.jar
file including a Changes
class reporting changes,
and with a installed and stopped system, we can:
Rename the file to fwd.jar
and add it to the user library directory:
unix$ mv xample.jar fwd.jar unix$ lx add fwd.jar ulib cp: fwd.jar local:/usr/local/leanxcale/ulib
Here, ulib
is the user library directory where to install the fwd.jar
file.
Should the program be an executable file, we would use instead:
unix$ lx add fwd ubin
If the system was not configured to forward changes when installed, we can enable forwarding by
setting the fwd
attribute with the command:
unix$ lx config -s 'fwd=yes'
This sets the global fwd
property to yes
in the installed configuration.
The next time the system starts, the data servers will report changes as dictated by
the fwd
property and the lx start
command will start the forward processes
by running the jwd.jar
.
To stop temporarly the forwarding processes, use lx stop
:
unix$ lx stop fwd
To restart forwarding, use lx start
:
unix$ lx start fwd
To update the forward program stop forwarding, add the program as shown above, and start forwarding again.
18. Using MFA with PAM
Each user requiring MFA must also configure their authenticator app (for instance the Google Authenticator app for Android).
In each user’s home directory, run the google-authenticator
command, specifying the time-based option:
unix$ google-authenticator -t
It will create a new secret key in $HOME/.google_authenticator
and display the new key both as a series of hexadecimal digits and as a QR code, to be entered into the authenticator app.
When using lx sql
, as well as the user name (with -n
) and password (with -p
) the current authenticator value should be
given with the -P
flag:
unix$ lx sql -n username -p password -P 123456
where 123456
should be replaced by the current code from the authenticator app.
19. Adding Hosts on LeanXcale Installs
To add more hosts to a existing install, edit the configuration file to add the extra hosts. The command:
unix$ lx config
can print the configuration if it is no longer available.
Once the new host(s) are added, run the install program to update the new hosts (and install them):
unix$ lxinst -u -f lxinst.conf
And finally, start the DB on the new host(s). For example, if we added a
host named blade123
, we can run:
unix$ lx start blade123
20. Reporting Issues
To report an issue, use lx report
to gather system information.
This program collects information from the system and
builds an archive to be sent to support.
unix$ lx report report: lxreport.231009... version... procs... logs... stacks... stack lxmeta100... stack kvds103... stack kvms100... stack spread... stack kvds102... stack kvds100... stack kvds101... stack lxqe100... # send this file to support. -rw-rw-r-- 1 leandata leandata 54861 Oct 9 14:58 lxreport.231009.tgz
As printed by the command output, the resulting tar file should be sent to support.
The archive includes:
-
installed version numbers
-
underlying OS names are versions
-
complete disk usage for the installed systems
-
complete process list for the installed systems
-
memory usage for the installed systems
-
lx process list
-
logs for components (last log file only, for each one)
-
stacks for each component
-
stacks for each core file found
When kvms
is still running, the archive includes also:
-
statistics for the sytem
-
long list of kv resources
-
process list for each kvds
-
file list for each kvds