LeanXcale Developer’s Guide

This document provides code writing guidelines as well as information about development in the leanXcale development environment.

1. Guidelines

If you write a tool, write a manual for it, and add it to the root of the gitlab project using it. Documentation should use restructured-text like in this document. Borrow the source directives from this one.
```
What is not in the manual, does not exist.
```
If you write a tool, make it print things only when whoever might be using it really needs to know something.
If you write a shell script, use clean, short, names for the script, do not add any .sh sufffix for its name, make it work from any directory and make it print its usage if it is not called in the right way.
If you are doing something for a project, keep all the steps you are doing written on a per-project doc so nobody has to be present to run anything or re-run anything for that project.
```
If the project has a gitlab project, make this document the README for it.
```
If you want to create something for LXMeta, add it in the TM project.
If you want to create something for the LXQE, add it in the qe_calcite project.
To log in java, log by feature/component, do not log by class names. Before enabling logging to debug, consider moving the log name to one of these names being setup as we go:
lxcm (conflict manager)
lxcs (commit sequencer)
lxlogger (db logger)
lxmeta (lxmeta process)
lxqe (lxqe process)
lxss (snapshot server)
lxtxn (transactions)
netinfo (/u stuff)
lxjdbc (jdbc requests and processing)
lxnet (network)
Before adding an external jar as a dependency, consider if writing the code to do what it does is just one or two hours of work. If that’s the case, do use the external jar.
If you need an external jar that is not in the lx/libs project, download it (eg, let gradle download it for you), and then, ask the owner of lx/libs to add the jar there.
If you have code used for testing, that code has to be used only by your package. If code is required for use from other packages, it is not testing code, and has to be included always in your package.
We prefer tabs with size 4 and using braces around conditionals and loops. You can use the IJ.CODE.xml IntelliJ code style template (the style name is lx).
We added .idea/codeStyles/Project.xml to the v2.0 branch for most git repositories, and IntelliJ knows how to use it.
In C, use the style found in kv sources. In particular, variables are declared at the start of functions, without initialization in the declaration, and function names are placed at the start of a line.
Never define classes for a package at a gitlab project different than the gitlab project that defined that package in the first place.
Do not define interfaces for things that have a single implementation. Actually, if you find one, kill it.
Do not write expressions like these
```
if (0 < N)
if (null != x)
```
Instead, use the more conventional
```
if (N > 0)
if (x != null)
```

Do not write ifs that check for errors and put the general case into an else, like

if (something bad or weird happen) {
	special case or handle an error
} else {
	the usual code
}

Instead, fail early and leave the code in the main body, like in

if (something bad or weird happen) {
	special case or handle an error
	return (or throw)
}
the usual code here

Do not nest too much, use local variables. And do not call the same method twice or more in the same function, use a local variable to cache the call, unless two calls are really needed.
Do not add wrappers. Kill wrapper functions that just call what we should have called. For example, if you want Conn.tbls(), you know where to find it. There is no need to use a wrapper method to call that.
If you have a global variable, use a global variable. That is: singletons are not objects, but static methods/members. Access them globally from the class, do not pass objects around that are not really objects.
Do not pass global variables as arguments. If a program as a global (eg, a server), keep it at a place where all the program can reach it. Do not pass the global to constructors or store it in members.
Global variables should have longer and explicit names. But not that long, eg., CONFMGR_QERESEND_DFLT is more than enough as a long name, thus, NEVER write things like:
```
CONFIGURATION_MANAGER_HAPROXY_RESEND_QE_INFO_PERIOD_DEFAULT_VALUE
```
Local variables and method arguments should have smaller, compact, names. For example, recSts can be enough, instead of kvdsRecoveryStatus, which leads to worse code.
Loop control variables should be named i, j, etc. as teached in every Programming-101 course around the world.
If you have a member that has both a get and a set method, or has a get method and is set only at the object creation time, and it is not locked, make the member public and remove getters/setters.
When using log4j methods to log diagnostics, apply the previous point and do not call functions in the log call, pass just objects or members to the call, so the log call does not require evaluating function calls unless it knows the logging level is enabled. At that point it will call toString for you. This can be used to get rid of isDebugEnabled and similar calls.
Do not put literature in log messages. Describe just what happens in a compact and regular way to permit machine processing of log files and make it easy to read the logs. A good message says something that happen, where did it happen, and the underlying reason. For example:
```
lxqe: user jim: authentication failed
setup: mkdir lib/utils: lib does not exist
```
To make it easy to use objects like transactions and the like in log calls, make their toString method return a simple and compact string for the object, like, for example, tnx.0x43223 for a transaction with the given TID. This is just an example. Using this, the call can be simply:
```
log.debug("{}: aborting", txn);
```
Do not use mock. If you want to use a server that does something, for testing, write your own and use it for the test;
If you want a std server to behave in a particular way for testing, use a TESTING environment variable or something like that and make your server honor it. That is, use real things for testing, so you test a real thing.
Gradle tests must be added in a way that excluding tests from the build does actually exclude them from the build process.
If you need to define a parameter, e.g., number of connections, put a constant like MAXCONNS (or a similar, clean, compact name). Do not use properties for this. If needed, use an environment variable like LXMAXCONNS or better, a command line argument for the program involved. The install configuration has parameters and that is where parameters should be.

These are some changes worth noting:

ServiceIPs are going. Use NetInfo string names.
Addresses use the Addr class.
Sockets se the Sock class.

2. Gitlab and packages

The distribution source is hosted on different projects at the gitlab:

lxinst: Installer and shared gitlab CI templates
- git@gitlab.leanxcale.com:lx/lxinst.git
spread: network groups
- git@gitlab.leanxcale.com:lx/spread.git
kv: key-value store
- git@gitlab.leanxcale.com:lx/Kivi.git
- uses: spread
libs: 3rd party java libraries used
- git@gitlab.leanxcale.com:lx/libs.git
avatica: locally modified Apache Avatica (query engine JDBC stubs).
- git@gitlab.leanxcale.com:lx/avatica.git
- uses: libs
calcite: locally modified Apache Calcite (query engine).
- git@gitlab.leanxcale.com:lx/calcite.git
- uses: libs, avatica
TM: lxmeta sources and tools.
- git@gitlab.leanxcale.com:lx/TM.git
- uses: libs, kv, spread
qe_calcite: Query engine sources, besides calcite.
- git@gitlab.leanxcale.com:lx/qe_calcite.git
- uses: libs, kv, spread, TM, kivi-api, avatica, calcite
odata: OpenDATA server.
- git@gitlab.leanxcale.com:lx/odata.git
prom: prometheus binaries and tools.
- git@gitlab.leanxcale.com:lx/prom.git
graf: grafana binaries and tools.
- git@gitlab.leanxcale.com:lx/graf.git

The last three ones might not be installed if not asked for. The last two ones are for stats. They are big packages, beware.

Other packages include drivers for using the system:

lxpy: Python driver.
- git@gitlab.leanxcale.com:lx/lxpy.git
lxhibernate: Support for Hibernate.
- git@gitlab.leanxcale.com:lx/lxhibernate.git

And, another package includes system tests, and therefore depends on everything.

lxtest: System tests/
- git@gitlab.leanxcale.com:lx/lxtest.git

The documentation is kept in this package:

Documentation: user guides and lxinst and lxdev guides.
- git@gitlab.leanxcale.com:lx/Documentation.git

3. The distribution

There are three distributions, each one contains the lxinst program used to install that distribution, along with the rest of the distribution:

development or unstable Found at https://artifactory.leanxcale.com/artifactory/lxdist. This includes the most recent version for the packages, without any tests used as filters.
stable Found at https://artifactory.leanxcale.com/artifactory/lxstable. This is published at midnight but only if tests passed.
public Found at https://artifactory.leanxcale.com/artifactory/lxpublic. This is a release distribution, in zip format, using the packages included in the zip as the basis for the install.

In the development and stable distributions, here is a directory per version, for example v2.0. In the public distribution there is a zip file per distribution.

In the directory for a version, there are compressed tar files (always using a .tgz extension, and never .tar.gz).

Each package may supply a single tar file for each one of these:

portable files
machine dependent files for a given architecture or system
sources

For example, these are some file names:

kv.v2.0.Linux.x84_64.tgz
kv.v2.0.Darwin.arm64.tgz
kv.v2.0.port.tgz
kv.v2.0.src.tgz

In the public distribution, there is a zip file per distribution. For example, this may be a file name:

lx.2.0.231129.zip

The third number after the version is the date for the release, to identify releases published for bug fixes and the like.

3.1. Releases

By convention, the last release number where we are working on is the current development release. This is so even if we decided to publish a public distribution for it.

Previous releases are frozen but for bug fixes as soon as the new release start.

We are in 2.3 as of today.

4. Reporting issues

When there is any problem with the system, please, follow these steps:

First, try to gather as much info as possible about your problem:

Get the status of the system with lx status and lx procs to see which processes are running and which ones are not running.
If a process died, or you know a process is not responsive, try to look at its log to see if it is reporting any problem. For example, lx logs -p lxqe prints the log for the QEs.
If the query engine is not responding, or it is failing, it might be because of a problem in kvds or kvms. Thus, try to look also at their logs to see if they are reporting any issue.
Should the problem happen when starting the system, the log for lxmeta is the one where you should look to see what that component is doing.
If the system is not responsive, try to get the stack for the components with lx stack.
If you got a core dump, try to get the stack dump from it using gdb. For example, if the QE crashed and you got a core.4324 file, use something like
```
gdb `which java` core.4324
gdb> thread apply all bt
```

Once you gathered all the possible info, save at least the logs for further inspection, and try to reproduce the issue with the smallest possible test that still reproduces the issue.

For example, reduce the number of table columns, the number of tables, the number of insertions, and so on, to the bare minimum and see if the issue still happens.

In many cases, it is possible to reproduce the issues with just a couple of requests, even if you found them after a 3 hour test run.

With the information at hand about the smallest test that reproduced the problem:

Open an issue for it and drop a line to tests@leanxcale.com about the problem.

5. Package conventions

To work on a particular package, it is important to know and follow the conventions used to build the whole sytem.

The ./lib directory keeps libraries needed to build the package are kept. Usually these come from other packages needed to build the one at hand (eg., jar files, property files, and dynamic libraries used).
The ./bin directory keeps programs needed to build the package, or to run it (eg, spread and kv binaries).
The LXBUILD script builds one or more of the following files, once libraries needed have been added to the ./lib directory.
A file named like pkg.v2.0.port.tgz (where pkg is the package built) contains portable files, that is, files for any architecture, and unpacks cleanly extracting files only at ./lib/ and ./bin, if any.
A file named like pkg.v2.0.Linux.x86_64.tgz (where pkg is the package built, and Linux is the system name, perhaps Darwin, and x86_64 is the architecture name, perhaps amd64) contains machine dependent files, that is, files for the given architecture, and unpacks cleanly extracting files only at ./lib/ and ./bin, if any.
A file named like pkg.v2.0.src.tgz contains the package source, and is expected to unpack cleanly under the ./src tree, if possible, or at least in a compatible way for everything else.

Not all these .tgz files must be built by LXBUILD. Only those needed.

Never create more than a single tar file for each architecture (including here port and src as architecture names).

Java packages CANNOT build using central repositories. That is, all depedencies must be previously downloaded at the ./lib directory.

This is important to preserve independence while working on a package, and also to keep things under control when multiple packages are being updated.

6. Working on a package

Clone your package
Download the .tgz files for the packages you depend on.
Extract them at your package root.
Have fun
Commit your changes and push them.

If work on multiple packages at once, you can just use the LXBUILD script from one of them, after making changes, and then extract the tar files built at the root of the other package you are working on if it relies on those files. For example:

unix$ cd /u/spread
unix$ LXVERS=2.0 ./LXBUILD
unix$ cd /u/kv
unix$ tar zxvf /u/spread/spread.v2.0.Darwin.arm64.tgz

7. Installing and using your own package

In general, if you want to use the whole system to test your package, you can just install the distribution:

unix$ lxinst blade124:~/xamplelx
...
install done.
Use these to run lx commands:
	blade124:/ssd/leandata/xamplelx/bin/lx
Or copy them to your bin at the host, or add the lx bin to your path:
# at blade124, add to your ~/.profile:
	export PATH=/ssd/leandata/xamplelx/bin:$PATH

Here, all the packages have been downloaded at ./lxdist, and the distribution has been installed at blade124:~/xamplelx.

We can start it as usual, for example:

unix$ export PATH=/ssd/leandata/xamplelx/bin:$PATH
unix$ lx start

And the same goes to use any other DB command.

To update your package with a new version, with the DB stopped, you can just go to the installed directory and unpack the tar files for your package:

unix$ cd /ssd/leandata/xamplelx
unix$ tar zxf ~/TM/TM.v2.0.port.tgz

It is probably easier to use the addlib command to copy and unpack the new package on the installed targets:

unix$ lx addlib ~/TM/TM.v2.0.port.tgz

Yet another use is to use addlib to add actual library files to the installed targets. For example, this achieves a similar effect to the previous command:

unix$ tar ztf ~/TM/TM.v2.0.port.tgz
lib/lxmeta-kivi-common.jar
lib/lxmeta-logger.jar
lib/lxmeta-elastic-mgr.jar
lib/lxmeta-console-common.jar
lib/lxmeta-console-client.jar
lib/lxmeta-CMS.jar
lib/lxmeta-commons.jar
lib/lxmeta-LIS.jar
lib/lxmeta-CS.jar
lib/lxmeta-tm-commons.jar
lib/lxmeta-tm-integration.jar
lib/lxmeta-ltm-kv.jar
lib/lxmeta-ltm-commons.jar
lib/lxmeta-CM.jar
lib/lxmeta-MM.jar
lib/lxmeta-SS.jar
unix$ rm -rf ./lib/
unix$ tar zxf ~/TM/TM.v2.0.port.tgz
unix$ lx addlib lib/*

See the install guide for more.

8. Example LXBUILD and Gradle

The LXBUILD script is given the LXVERS environment variable with the system version (currently v2.0). For example, it is called like in:

unix$ LXVERS=v2.0 LXBUILD

It is imperative that this script either works, or exits with a non-zero status.

To write your own LXBUILD, just copy one and adapt it to your needs.

This is an example LXBUILD file, used by the TM package as of today:

#!/bin/sh
PKG=TM
if [ -z $JAVA_HOME ] ; then
	if [ -d /Library/Java/JavaVirtualMachines ] ; then
		for f in /Library/Java/JavaVirtualMachines/*/Contents/Home ; do
			if $f/bin/java -version 2>&1 | grep 11.0 >/dev/null ; then
				export JAVA_HOME=$f
				export PATH=$JAVA_HOME/bin:$PATH
			fi
		done
	fi
	if [ -d /usr/lib/jvm ] ; then
		for f in /usr/lib/jvm/*jdk* ; do
			if $f/bin/java -version 2>&1 | grep 11.0 >/dev/null ; then
				export JAVA_HOME=$f
				export PATH=$JAVA_HOME/bin:$PATH
			fi
		done
	fi
fi
PATH=$JAVA_HOME/bin:$PATH:.
name="master"
case x$LXVERS in
xv[0-9]*)
	name=$LXVERS
	;;
esac
LXVERS=$name
OS=`uname`
ARCH=`uname -m`
DISTARCH=$OS.$ARCH
export LXVERS DISTARCH
TAR=$PKG.$name.port.tgz
LIBSTAR=$PKG.$name.libs.tgz
SRCTAR=lx$PKG.$name.src.tgz
TAR=$PKG.$name.port.tgz
gradlew clean || exit 1
gradlew build -x test || exit 1
rm -f lib/*0.0-SNAPSHOT.jar
jars=`find . -name 'lxmeta*.jar'`
paths=""
for j in $jars ; do
	jname=`basename $j`
	cp $j lib
	paths="$paths lib/$jname"
done
rm -f *.tgz
tar zcvf $TAR $paths
ls -l $TAR

Here, you might replace this with something else, to build using your preferred build tools:

gradlew clean || exit 1
gradlew build -x test || exit 1

For example, it could be:

make all || exit 1

Also, you must adapt the final part too:

# Just move your stuff under ./lib and ./bin
# and run tar zcvf YOURTARFILENAME lib/yourfilesonly bin/yourfilesonly
rm -f lib/*0.0-SNAPSHOT.jar
jars=`find . -name 'lxmeta*.jar'`
paths=""
for j in $jars ; do
	jname=`basename $j`
	cp $j lib
	paths="$paths lib/$jname"
done
rm -f *.tgz
tar zcvf $TAR $paths
ls -l $TAR

To use the gradle installed on the gitlab runner, instead of downloading it all times, you can put this early in the gradlew (gradle wrapper) of your project:

if [ -x /opt/gradle/gradle-7.6/bin/gradle ] ; then
	PATH=/opt/gradle/gradle-7.6/bin:$PATH
	exec /opt/gradle/gradle-7.6/bin/gradle $*
fi

This uses such gradle when it is found, and continues with the rest of gradlew otherwise.

9. Gitlab integration

To create updated packages for your package at the repository, a .gitlab-ci.yml file should be added. This file must include just this:

variables:
  DEPS: "libs spread kv"

include:
  - project: "lx/lxinst"
    ref: master
    file: "/dflt.yml"

stages:
  - make

In this example, the DEPS variable includes the packages needed to build the one at hand.

The gitlab will download the tar files for them, and unpack them, before running your LXBUILD script and uploading the resulting tars to the distribution.

10. Package tests

Package tests should not depend on anything outside the package.

When using gradle, use gradle test tasks for your tests.

Tests should not be built by default.

11. System tests

Most system tests are kept at the lx/lxtests.git gitlab project. This project includes tests written in Java and tools to run system tests (from any other package or language) and to update releases.

We describe system tests here and how to write system tests in Java in a following section.

System tests are:

Tests written in Java in lxtests, in a Main class that can be called from the command line, found in a package at com.leanxcale.tests (at src/main/java/com/leanxcale/tests). The test name is the package name. For example com.leanxcale.tests.rec1.Main is the main program for the rec1 test.
Tests written in whatever laguage desired, kept in a per-test directory with a command LXTEST found on that directory, to run the test. The directory name is the test name. For example lxpy/LXTEST is the main program for the lxpy test.

A comment at the start of line in the Main.java or LXTEST file, with the format:

// lxtest: flags...

# lxtest: flags

destribes the test flags. This comment is not needed if there are no flags to describe.

The known test flags are:

unix The test can run at unix (eg, on a blade outside of a docker install).
docker The test is meant to run on a docker install.
mirror The test is meant to run on a mirror install using another host for the mirror.
long The test is considered a long test, takes from 1 to 3 minutes.
vlong The test is considered a very long test, takes more than a few minutes, and it might be it takes hours.
skip The test is not ready and should not be used. (It is preferred to list the test name in the SKIP file instead).

When platforms (unix, mirror, docker) are not given, unix is taken by default. Otherwise, only the found platforms are used. A test is meant to run at the underlying host (considered unix) or at docker, or on a mirror setup with an external host for the mirror. By default, tests run only on the unix platform.

For example, this says that the test can run at unix, docker, and that it is a long test:

// lxtest: unix docker long

For unix tests, the DB is installed at ./testdb on the project root. For docker tests, the containers have the usual lx1 host names.

For mirror tests, the mirror host must have exactly the same directory used for the local test execution.

To skip a test, we might use

// lxtest: skip

It is also possible to create a SKIP file listing tests to skip, one per line, at the root of the lxtest directory.

System tests are executed usually using bin/runtests as shown later.

11.1. Setting up the test environment

The gitlab project depends on all other projects and also on lxinst. You are expected to extract at least the lxinst tar from the distribution so that its programs and libraries are available for testing, and, to execute the tests, you have to have the libraries extracted at the conventional ./lib directory.

To run system tests, clone lxtest at the testing machine, eg, using a gitlab token so you can push things too:

TOKEN=USER:TOKEN
git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxtest.git
history -c

Now, checkout the version to test

cd lxtest
git checkout v2.3

And use mkrlse to download everything. Do not forget to set your $LXKEY variable so the distribution can be downloaded:

LXKEY=nemo:APA2Pixxxxx7YuQy
history -c
mkrlse download

If you plan to compile and publish the JDBC, python, and hibernate drivers, you have to clone these projects too. Otherwise you might skip this:

git clone https://$TOKEN@gitlab.leanxcale.com/lx/qe_calcite.git
git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxpy.git
git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxhibernate.git
history -c

Note the convention is to clone then within lxtest.

For python tests, you need venv. It is probably in your system. Should you need to install inst, you will see a note like the following when building the lxpy project, as a reminder:

The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

apt install python3.8-venv

To run the ODBC tests, you must install the UNIX ODBC and protocol buffers for C packages and clone the project lxodbc as explained next. If you do not run those tests, you may skip the following two steps.

To install ODBC and protocol buffer stuff on your UNIX system use comands similar to these ones for Ubuntu:

sudo apt-get update
sudo apt-get install libssl-dev
sudo apt-get install unixodbc-dev
sudo apt-get install libprotobuf-c-dev
sudo apt-get install protobuf-c-compiler
sudo apt-get install uuid
sudo apt-get install uuid-dev

To clone the lxodbc proyect at the lxtest directory, do this:

git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxodbc
history -c

To run ldap tests, just install an example LDAP by executing

sudo bin/ldap/installldap

and then configure it as expected by the tests using

sudo bin/ldap/configldap

After you cloned the extra projects like qe_calcite and others, within the project lxtest, you might use mkrlse again to checkout and build them for use on the tests

mkrlse download

If you cloned also qe_calcite and lxpy, this will pull changes for them as well (otherwise only the distribution is downloaded). Also, this does pull lxtest itself for updates.

Before running any test, you need a license file, like everybody else. You might copy one to your home directory:

unix$ cp mylicense ~/.

Should you update the source of lxtests, you can build it by hand to use it:

unix$ LXBUILD
...
-rw-rw-r-- 1 leandata leandata 15220291 Nov  7 15:16 lxtest.v2.0.port.tgz
unix$ tar zxf lxtest.v2.0.port.tgz

11.2. Running system tests

To list or execute one or multiple tests, use bin/runtests:

usage: runtests [-h] [-D] [-a] [-m host] [-A] [-F] [-f flag] [-k] [-l] [-n no]
                [-p plat] [-r] [-s name] [-X testarg]
                ...

run lxtest tests

positional arguments:
  name        tests starting with

optional arguments:
  -h, --help  show this help message and exit
  -D          enable debug diags
  -a          include long tests
  -m host     include mirror tests using host
  -A          include long and vlong tests
  -F          stop on failures
  -f flag     only tests with flag
  -k          keep crime scene
  -l          list tests (long if repeated)
  -n no       set max test nb
  -p plat     platform (unix, docker, mirror)
  -r          run with relops
  -s name     start at test with name
  -X testarg  pass arg to test

The program operates on all tests found, unless (prefixes of) test names are given, in which case only matching tests are selected.

Tests are sorted by name before being listed or executed.

Flags -l and -ll make the program list the tests and exit. The former lists just test names:

unix$ bin/runtests -l
aggnames
conflict1
...

The later is a long listing, printing commands to run each one of the tests found and describing the flags found for them. Refer to the previous section for the known flags.

To list (or execute) tests that have a given flag, use option -f as in

unix$ bin/runtests -f vlong -ll
java -cp 'lib/*' com.leanxcale.tests.rec2.Main	#unix long
...

When test name prefixes are given as arguments, runtests considers tests with names starting with any of the arguments given. For example,

unix$ bin/runtests -ll aggn
java -cp 'lib/*' com.leanxcale.tests.aggnames.Main	#unix

Or, to long list recovery tests:

unix$ bin/runtests -ll rec
java -cp 'lib/*' com.leanxcale.tests.rec1.Main	#unix
java -cp 'lib/*' com.leanxcale.tests.rec2.Main	#unix long
...

When not listing, tests are executed, unless they have the skip flag.

A test has the skip flag if the test includes the flag in the lxtest comment, or if the test name is listed in the SKIP file kept at the root of the lxtest directory.

By default, docker and long tests are excluded from the run. Flag -a considers also docker and long tests, not just short ones, but does not include very long tests.

Flag -A considers all tests, including very long ones. Beware that such tests might take 10 hours for a single test in some cases.

For example:

unix$ bin/runtests rec
run    rec1            unix rec1.out ...
pass   rec1            test: 45s             total: 45s
run    rec2            unix rec2.out ...
pass   rec2            test: 1:39            total: 53s
run    rec3            unix rec3.out ...
FAILED rec3            test: 2:10            total: 30s

runs recovery tests (or those with a matching name) that are not long tests and run on unix.

Some tests are numbered, like rec1, rec2, etc. Some of them have many ones, and it may be desired to limit the number of tests to a given number. Flag -n may be used to stop running numbered tests at the given number. For example,

unix$ bin/runtests -n 2 rec

will not run tests with names rec3, rec4, etc.

When a test fails, the failure is reported but following tests continue to run. Use flag -F to make no further tests run upon failure. In this case, after a faiture, the installed DB is kept as-is for inspection, after killing the processes.

It is possible to prevent tests from installing/starting/formatting the test DB and use an already started DB at the conventional ./testdb directory. To do so, set the LXTESTRUNNING environment variable.

Flag -k asks runtests not to kill the processes when a test fails, to attach a debugger or inspect them after running the test. This makes sense only if flag -F is used too.

Flag -s makes the program start tests at the given one. For example, after running

unix$ bin/runtests

unix$ bin/runtests rec

If the test rec3 failed and we interrupted the tests, we may run

unix$ bin/runtests -s rec3

unix$ bin/runtests -s rec3 rec

to continue with remaining tests.

For debugging, you can set the log properties as shown in the previous section, and you can also create a log4j.properties file at the top-level directory, which will be used and installed on the testing DB.

To attach a debugger to a java process, set the LXQEDEBUG environment variable or the LXMETADEBUG environment variable.

In this case, for LXQE, the debug ports will be 5005, 5006, etc. If the variable value is a component name, eg lxqe100, then only that process will have the debug port set.

The same happens for LXMETA, although ports here will be 5050, 5051, etc.

To enable kv tracing, just set the KVDEBUG environment variable.

To enable java debug flags (e.g., those used by Sock.java and other components), set the LXDEBUG environment variable. It works very much like KVDEBUG, but for java processes.

To run tests that use mirror installs, the name of the extra host must be supplied, which enables execution of mirror tests. For example:

bin/runtests -m blade107 ha01

runs the ha01 tests using blade107 as the mirror host for the current host.

12. Writing system tests in Java

For system tests written in Java, use the lx/lxtests.git gitlab project. Here we describe the conventions for adding tests, see above to learn how to actually execute tests.

Create a package in that project at src/main/java/com/leanxcale/tests for your tests. For example, com.leanxcale.tests.rec1 has a recovery test.

At that package, create a Main class to run the test with the usual main method. No class may be named Main other than the test in this package. The program must take flag -k to keep the processes started and the installed DB for inspection after the test. Take a look to com.leanxcale.tests.rec1.Main as an example.

The test MUST do an exit(0) if the test passes and exit(1) if the test fails (if the program terminates with an exception upon errors, that is fine). The tools LxTest.passed() and LxTest.failed() will take care of this if you use them.

A main program might be as follows:

public static void main(String[] args) {
	Boolean keep = false;
	if (args != null && args.length > 0 && "-k".equals(args[0])) {
		keep = true;
	}
	try {
		// use this only if you want to set debug for java
		LxTest.log4j = new HashMap<>() {{
			put("lxlogger", "DEBUG");
			put("lxmeta", "DEBUG");
		}};
		// use this if you want to set certain variables for the run
		LxTest.configProps = "KVDEBUG\tDOOMPPII";
		// and install and run the test
		LxTest.install();
		runTest();
		LxTest.passed(keep);
	} catch(Exception e) {
		LxTest.failed(keep, e);
	}
}

Here, you can see how the log levels for a few components have been adjusted before installing (and starting) the DB and running the test.

The install, kill, format, and related methods from LxTest will do nothing if the environment variable LXTESTRUNNING is set. This can be used to run a test on an already running ./testdb database.

If you want to write several related tests, for example recovery tests, name their packages starting with a common prefix (eg, rec1, rec2, etc.).

If your tests want to share common code, you are free to create a shared package and put that code there (use the prefix for your test packages as the package name) or just put the shared code on the first test and use it at will.

13. Sequencing test events

The kv library includes tools that work with a kvseq server to permit sequencing for events for tests that need to coordinate. Events are single-line strings that are kept forever and are sequenced by the server receiving them.

The kvseq server may run at any host and will accept connections from other processes and convey events to them:

usage: kvseq [-a addr] [-n name]

To start it, run it with the default tcp!localhost!6666 address or supply your own listen address with flag -a:

unix$ kvseq &

Once started, programs that want to sequence events (or wait for them) may use these functions from the kv library (or their kv.Conn equivalents):

void kvseqinit(char *name, char *addr)

dials the server at addr and identifes as name (use something like qe100).

void kvseqat(char *s)

notifies that the event s did happen.

void kvseqwait(char *s, int nb)

blocks the caller until the event s did happen at least nb times.

int kvseqafter(char *s, int nb)

returns -1 (or false in java) if the event s did not happen at least nb times.

Events arrive.name and leave.name are automatically generated for clients as they dial the server and they disconnect from it.

By convention, if $KVSEQ is set, NetInfo will initialize the client for java processes.

For example, for a particular test, a program might call

kvseqat("created my tree")

and another call

kvseqwait("created my tree", 0)

to wait for the former to create a tree. Or perhaps,

if (kvseqafter("created my tree", 3) >= 0) {
	xfatal("failing for testing");
}

to cause an abort if the former created its tree three times already.

The kvseq command may be used as a client by giving the -n option with the name to use for the client. For example:

unix$ kvseq -n cli2
dialed from 127.0.0.1:49510
write events, ?event to wait, @event to check
seq> hi
seq> there

will produve the events:

arrive.cli2
hi
there

Debug flag -O might be used to make the client report operations (events sent and received), but the main use of the command is to generate events from the shell.

14. Tests and releases

Releases are made of packages and should be published only when tests pass.

The lxtest project described before is the basis for running system tests and includes the mkrlse command to download, test, and publish releases.

As a convenience, blades 107, 107, and 110 have a testing environment setup as descibed in the previous section about running system tests.

The lxtest project is cloned at

blade105:~/lxtest23 for release v2.3
blade107:~/lxtest23 for release v2.3
blade110:~/lxtest22 for release v2.2

Plus, there is a crontab entry to run the mkrlse program every night:

leandata@blade105:~$ crontab -l
0 0 * * * sh -c 'test -x $HOME/lxtest21/mkrlse &&
(. $HOME/.bashrc ; cd $HOME/lxtest21 ; ./mkrlse) >$HOME/lxtest21/mkrlse.out 2>&1'

The crontab is a single line, folded here for readability.

The mkrlse program pulls changes, downloads the distribution, rebuilds the tests, runs them (excluding very long ones), updates the status at the gitlab, and when tests pass, publishes the downloaded distribution as the stable one.

The ~/lxtest* directories are used as a base to run mkrlse and/or bin/runtests for the releases of interest.

Usually, they include a clone of the qe_calcite project and of the kv project too, within lxtest. You can mostly forget about this, but it might be useful to know it.

The mkrlse command usage is:

usage ./mkrlse [public|steps...]
	no args:   download build tests newstable
	public:    getstable build newpublic
	steps:
		download: download the unstable distribution
		getstable: download the stable distribution
		getpublic: download the public release distribution
		build: extract the tars and builds for the downloaded version tests.
		tests: run the tests
		newstable: upload the stable release from ./lxdist/*
		newpublic: upload the public distribution and publish the drivers

It may be called with a series of steps, like in

mkrlse download build tests newstable

or without arguments (which would do exactly as shown in the above command).

As a convenience, public is understood to mean

mkrlse getstable build newpublic

which downloads the stable distribution, builds drivers and test code, and uploads the public distribution and its drivers.

When tests are asked for, the command uses bin/runtests -a to run the tests. It is often handly to call bin/runtests by hand to run tests or to run only a subset of existing tests.

In what follows, we use ~/lxtest as the example directory, but note it might be ~/lxtest21 or some other one depending on the release.

The packages downloaded from the distribution are kept at the ~/lxtest/lxdist directory. Thus, to try out a different version of a particular package, we might

leandata@blade105$ cp kv.*.tgz ~/lxtest/lxdist
leandata@blade105$ cd ~/lxtest
leandata@blade105$ bin/runtests

to see how it behaves.

Note that calling mkrlse will download packages and overwrite those you have copied by hand at lxdist.

When tests pass, mkrlse pushes a tests passed commit to the gitlab, including the runtests.out file listing the tests executed.

When tests fail, a tests failed commit is pushed, and the output of the failed test is included and pushed in a FAILS.out file. This also triggers the failure of a pipeline stage on that project, so the event is notified.

Check any of these files or look at the blade for further inspection on the testing status for the system.

14.1. Updating Releases

After fixing a bug or making a change, you might want to update a release. The unstable distribution for a release is updated just by pushing the new version of the package source to the gitlab.

To test and publish the stable distribution, just run mkrlse. Refer to the previous section for options if you want to perform only some of the tasks (eg, just publish with the already downloaded packages).

leandata@blade105$ cd ~/lxtest	# or ~/lxtest23 or whatever
leandata@blade105$ mkrlse

The command will publish the stable release when tests pass, or report a failure and push the failure to the gitlab.

You can use mkrlse to update the public distribution from the one found on the stable distribution.

leandata@blade105$ mkrlse public

It is advisable to run all the tests (including the very long ones) in this case, before publishing it.

15. Debugging transactions

The lxtest project includes a command bin/qelogtxns useful to debug transactions and conflicts.

usage: qelogtxns [-h] [-f file] [-n] [-v] [-a] [-D] [-b]

check transactions in the lxqe log

options:
  -h, --help  show this help message and exit
  -f file     log file
  -n          dry run: just print txns
  -v          verbose
  -a          print all transactions
  -D          enable debug diags
  -b          checkout lxtest/conflict2 txns

To use it, enable debug diagnostics for both the query engine and kv. For example, in a standard test, you can do this using

LxTest.log4j = new HashMap<>() {{
	put("lxqe", "DEBUG");
	put("lxjdbc", "DEBUG");
	put("lxtxn", "DEBUG");
	put("lxcm", "DEBUG");
}};
LxTest.configProps = "KVDEBUG\tDOOI";

In any case, you can adjust the log4j.properties file and the KVDEBUG property in the installed configuration file with similar values.

It is advisable to run using a single query engine so the whole set of transactions is shown in a single log file.

After running the desired load, use qelogtxns to perform basic checks and report the transactions discovered in the log file. For example:

unix$ bin/qelogtxns -v -f testdb/log/lxqe100.240318.1356.log
txn.0x52082: rdonly ses 1 mode session
	start:  sts 335001	testdb/log/lxqe100.240318.1228.log:186
	commit: cts 0	testdb/log/lxqe100.240318.1228.log:186
txn.0x733c2: committed ses 2 mode session
	start:  sts 471001	testdb/log/lxqe100.240318.1228.log:568
	commit: cts 573002	testdb/log/lxqe100.240318.1228.log:739
	Rget txn.0x733c2 db-APP-ACCOUNT	cts 0	testdb/log/lxqe100.240318.1228.log:732
		tpl t573002	['1']	['1000']
	Tadd txn.0x733c2 db-APP-ACCOUNT	cts 573002	testdb/log/lxqe100.240318.1228.log:762
		tpl t573002	['1']	['1000']
		tpl t573002	['2']	['1000']
	check key key.0x8f563c8a98d3076d ok	testdb/log/lxqe100.240318.1228.log:655
		t0 -1 1|  1000
	check key key.0xc522be8ac634c7f0 ok	testdb/log/lxqe100.240318.1228.log:669
		t0 -1 2|  1000
...
checking...
checks ok

Here, each committed transaction is reported, along with the STS and CTS values. The final status for the transaction is reported after its name (e.g., rdonly or committed in this example).

Transactions that did not write and were aborted to avoid the (empty) commit overhead are reported as rdonly, and do not have a known CTS.

The line

start:  sts 335001	testdb/log/lxqe100.240318.1228.log:186

reports the start and the point in the log file where this did happen.

The line

commit: cts 0	testdb/log/lxqe100.240318.1228.log:186

reports the commit and the point in the log file where this did happen.

Lines like this (or the ones with Rscan)

Rget txn.0x733c2 db-APP-ACCOUNT	cts 0	testdb/log/lxqe100.240318.1228.log:732
	tpl t573002	['1']	['1000']

report the reads made and the values retrieved, and where in the log did they happen.

Lines like

Tadd txn.0x733c2 db-APP-ACCOUNT	cts 573002	testdb/log/lxqe100.240318.1228.log:762
	tpl t573002	['1']	['1000']
	tpl t573002	['2']	['1000']

report the writes made, and the CTS used.

Conflicts checked out are shown as in

check key key.0x8f563c8a98d3076d ok	testdb/log/lxqe100.240318.1228.log:655
	t0 -1 1|  1000

It is also possible to run the tool to show all transactions, including those with (conflicts or) errors. For example:

unix$ bin/qelogtxns -a -f testdb/log/lxqe100.240318.1356.log

In this case, transactions are shown as in

txn.0x1992e2: errors ses 3 mode session
	start:  sts 1675001	testdb/log/lxqe100.240318.1356.log:554580
	commit: cts 0	testdb/log/lxqe100.240318.1356.log:554580
	Rget txn.0x1992e2 db-APP-ACCOUNT	cts 1669002	testdb/log/lxqe100.240318.1356.log:554605
		tpl t1669002	['5']	['940']
	check key key.0x9562fea221d06426 conflict	testdb/log/lxqe100.240318.1356.log:554688
		t0 - 5|  930

Note the errors status in the transaction header line, and also the conflict reported for a conflict check.

16. Building on RHEL

To build kv and spread for other systems, you might adapt and/or use AWSBUILD. This script (found on the lxinst repository) knows how to create an AWS instance, download and copying kv and spread for it, and get back the package tar files.

So far, this has been used only for RHEL, beware. You need the AWS keys, plus the lxinst.pem file using for access to lxinst installed hosts.

Beware that the script creates and removes directories for kv and spread. It is better to run it on a directory created just for this.

For example:

unix$ export AWS_ACCESS_KEY_ID=XXXXXX
unix$ export AWS_SECRET_ACCESS_KEY=XXXXXX
unix$ cp YOUR_LOCATION_FOR_SUCH_FILE ./lxinst.pem
unix$ ./AWSBUILD rhel x64_64
see AWSBUILD.log for detailed command output.
clone kv...
clone spread...
instance id i-01a2f42e5be4ebd98
instance ip 67.202.36.229
***CAUTION: instance might be left running upon failures
remove it running:
	 aws ec2 terminate-instances --region us-east-1 --instance-ids i-01a2f42e5be4ebd98

trap: INTR: bad trap
install packages at target...
copying kv/spread sources...
build spread...
build kv...
copy binaries back...
-rw-r--r-- 1 leandata leandata 50317810 Jul 17 15:42 kv.v2.0.Linux.x86_64.tgz
-rw-r--r-- 1 leandata leandata   197901 Jul 17 15:42 kv.v2.0.port.tgz
-rw-r--r-- 1 leandata leandata  9257326 Jul 17 15:42 kv.v2.0.src.tgz
-rw-r--r-- 1 leandata leandata  1856070 Jul 17 15:42 spread.v2.0.Linux.x86_64.tgz
removing instance i-01a2f42e5be4ebd98 ...

17. AWS and DNS at aws.leanxcale.com

The domain aws.leanxcale.com is delegated from the leanxcale DNS to the Amazon Route 53 zone of the same name.

All installs using lxinst for AWS using a tag, set the tag to tag.aws.leanxcale.com where tag is the tag given as the value for the awstag config property.

The gitlab project lx/awsdns.git contains a tool, mklambda that creates event watchers and a lambda function to update A records in the DNS for instances that become running.

To create the lambda at a region:

mklambda us-west-2

To remove the lambda at a region:

mklambda -d us-west-2

To create the lambda at all regions:

mklambda

To remove the lambda from all regions:

mklambda -d

If the lambda was already created at a region, it is not updated, and nothing is done.

18. Gitlab runners

For the gitlab, we use the drunner6 runner.

The gitlab source for the runner is at

git@gitlab.leanxcale.com:runners/drunner6.git

Just pull and add/remove things to the docker file.

Then, to update the runner:

image=drunner6
docker login registry.leanxcale.com/$image
docker build .
docker build --security-opt seccomp=unconfined .
id=XXX whatever id for the image docker printed XXX
docker tag $id registry.leanxcale.com/$image
docker push registry.leanxcale.com/$image

19. Documentation

Refer to the

git@gitlab.leanxcale.com:lx/Documentation.git

project. The README file there describes what you need to install to update the docs and how to generate them from the source.

The docs are generated whenever you push a new version to the v2.0 branch or any other release branch.

To make changes on the docs,

checkout your branch (eg., v2.0)
```
git checkout v2.0
```
edit the files you want
look at your changes
```
git status
```
add the changed files to your commit
```
git add file.adoc
```

or pehaps

git add .	# adds everything changed under the current directory

commit your changes

git commit -m 'short message describing the changes'

pull changes made by others
```
git pull -v
```

If there are conflicts, resolve them by editing by hand the conflicting files and then using git add on them to resolve the conflicts, and git commit to finish the merge.

push your changes for others
```
git push -v
```
If your commit number is 0dedf50b, and you want to apply the same set of changes to, say, branch v2.1, then use git cherry-pick to do so.
```
git checkout v2.1
git cherry-pick 0dedf50b
git push
```
do not forget to get back to your working branch
```
git checkout v2.0
```
To get a clue regarding the commits pushed, go to the gitlab project and use the Code/Repository Graph menu to get the tree of commits.

20. Moving to a New Release

Remember that, by convention, the last release number where we are working on is the current development release. This is so even if we decided to publish a public distribution for it.

To start a new release (e.g., v2.1), start with the current one (e.g., v2.0). Make sure nobody is changing it, and it passed the tests, and you either pulled the last changes for the packages involved or cloned at the last version.

First, clone the lxinst package from the git and create the new branch.

unix$ git clone git@gitlab.leanxcale.com:lx/lxinst.git
unix$ cd lxinst
unix$ git checkout v2.0
unix$ git checkout -b v2.1

Then, edit the installer so it will fetch packages from the new version. Just change src/lxdefs and update the LXVERS and LXREPO definitions early in the file. For example,

LXVERS='v2.1'
LXRLSE="unstable"
LXREPO="https://artifactory.leanxcale.com:443/artifactory/lxdist/v2.1/"

Run LXBUILD to be sure it is ok and then push the new version.

unix$ LXBUILD
unix$ git push

The scripts LXBUILD at this and every other package will use the git branch as the version name, but, for safety, it may be a good idea to edit the default version number they use as well.

Now, go through the release packages and, for each one, create a branch with the new version, for example:

unix$ git checkout v2.0
unix$ git checkout -b v2.1
unix$ git push

It is a good idea to do this in order, using the package list shown early in this document.

You should also go to blade105 or whereever you are running daily tests, and update the setup to test daily the new development release.

One way is to update the existing setup to run the new release. You might also copy it to a different directory to keep the testing setup for the old one.

For rexample, rename the directory for the new release, and fetch the new branches

cd $HOME
mv lxtest20 lxtest21
cd lxtest21
git fetch && git checkout v2.1
cd qe_calcite ; git fetch && git checkout v2.1
cd ..
cd lxpy ; git fetch && git checkout v2.1
cd ..
cd kv ; git fetch && git checkout v2.1

Then edit the crontab to run the new release tests

crontab -e
... and change lxtest20 to lxtest21 ...

The unstable distribution will be published directly by the pushes. To generate the stable distribution, go to the blade running the lxtest program mkrlse and run it. Do this in the branch for the new version, of course.

That will run the tests, and, when they pass, publish the new stable distribution for the new version.

Once the new version is ok for a new public distribution, use

mkrlse release

as usual.

21. Hunting leaks

To hunt for leaks, use a running system.

For Java processes, locate the process pid, eg.

unix$ jps
683466 LXMeta
683448 LXQE
672108 GradleDaemon
687379 Jps

Then, extract a heap histogram:

unix$ jmap -histo:live 683448 | sed 50q > map1.out

Here, 50 lines of output suffice unless you want to check out everything. After a while, giving time to the GC to enter and reclaim unused space, extract a new one:

unix$ jmap -histo:live 683448 | sed 50q  > map2.out

Then, you can use the jleak tool as found on the bin directory of the lxtest project, to inspect both ones and report the types for which total allocation did grow sorted by increase of size in bytes.

unix$ jleak map1.out map2.out
java.lang.invoke.DirectMethodHandle$Accessor: 349 objs 13960 bytes
java.lang.invoke.BoundMethodHandle$Species_L: 424 objs 13568 bytes
java.util.ArrayList: 564 objs 13536 bytes
java.util.zip.Inflater: 209 objs 13376 bytes

It is also useful to produce a full heap dump

unix$ jmap -dump:format=b,file=heap.dump 683448

And to inspect the resulting file with something like the Eclipse Memory Analyzer, which includes an automated report for suggested leaks.

Here, we got 564 extra ArrayLists with their total size increased by 13536 bytes. You’ll have to search for them in a heap dump if that is a suspected leak.

Now, for C processes, and for the C part of the QE process, you can use memleax.

It can be installed as in

unix$ git clone https://github.com/WuBingzheng/memleax.git
unix$ cd memleax
# these are needed unless the packages have been installed
unix$ sudo apt install cmake
unix$ sudo apt-get install libunwind-dev
unix$ sudo apt-get install libelf-dev
unix$ sudo apt-get install libdwarf-dev
unix$ sudo apt-get install libdw-dev
unix$ mkdir build
unix$ cd build
unix$ cmake ..
unix$ make
unix$ sudo make install

And, once available, you can run it for a running pid. It will intercept calls to memory allocation/deallocation and, those that survive the given interval in seconds will be reported.

For example

unix$ memleax -e 60 683448