LeanXcale Developer’s Guide
This document provides code writing guidelines as well as information about development in the leanXcale development environment.
- 1. Guidelines
- 2. Gitlab and packages
- 3. The distribution
- 4. Reporting issues
- 5. Package conventions
- 6. Working on a package
- 7. Installing and using your own package
- 8. Example LXBUILD and Gradle
- 9. Gitlab integration
- 10. Package tests
- 11. System tests
- 12. Writing system tests in Java
- 13. Sequencing test events
- 14. Tests and releases
- 15. Debugging transactions
- 16. Building on RHEL
- 17. AWS and DNS at aws.leanxcale.com
- 18. Gitlab runners
- 19. Documentation
- 20. Moving to a New Release
- 21. Hunting leaks
1. Guidelines
-
If you write a tool, write a manual for it, and add it to the root of the gitlab project using it. Documentation should use restructured-text like in this document. Borrow the source directives from this one.
What is not in the manual, does not exist.
-
If you write a tool, make it print things only when whoever might be using it really needs to know something.
-
If you write a shell script, use clean, short, names for the script, do not add any .sh sufffix for its name, make it work from any directory and make it print its usage if it is not called in the right way.
-
If you are doing something for a project, keep all the steps you are doing written on a per-project doc so nobody has to be present to run anything or re-run anything for that project.
If the project has a gitlab project, make this document the README for it.
-
If you want to create something for LXMeta, add it in the TM project.
-
If you want to create something for the LXQE, add it in the qe_calcite project.
-
To log in java, log by feature/component, do not log by class names. Before enabling logging to debug, consider moving the log name to one of these names being setup as we go:
-
lxcm (conflict manager)
-
lxcs (commit sequencer)
-
lxlogger (db logger)
-
lxmeta (lxmeta process)
-
lxqe (lxqe process)
-
lxss (snapshot server)
-
lxtxn (transactions)
-
netinfo (/u stuff)
-
lxjdbc (jdbc requests and processing)
-
lxnet (network)
-
Before adding an external jar as a dependency, consider if writing the code to do what it does is just one or two hours of work. If that’s the case, do use the external jar.
-
If you need an external jar that is not in the lx/libs project, download it (eg, let gradle download it for you), and then, ask the owner of lx/libs to add the jar there.
-
If you have code used for testing, that code has to be used only by your package. If code is required for use from other packages, it is not testing code, and has to be included always in your package.
-
We prefer tabs with size 4 and using braces around conditionals and loops. You can use the
IJ.CODE.xml
IntelliJ code style template (the style name is lx). -
We added
.idea/codeStyles/Project.xml
to thev2.0
branch for most git repositories, and IntelliJ knows how to use it. -
In C, use the style found in kv sources. In particular, variables are declared at the start of functions, without initialization in the declaration, and function names are placed at the start of a line.
-
Never define classes for a package at a gitlab project different than the gitlab project that defined that package in the first place.
-
Do not define interfaces for things that have a single implementation. Actually, if you find one, kill it.
-
Do not write expressions like these
if (0 < N) if (null != x)
-
Instead, use the more conventional
if (N > 0) if (x != null)
-
Do not write ifs that check for errors and put the general case into an else, like
if (something bad or weird happen) { special case or handle an error } else { the usual code }
-
Instead, fail early and leave the code in the main body, like in
if (something bad or weird happen) { special case or handle an error return (or throw) } the usual code here
-
Do not nest too much, use local variables. And do not call the same method twice or more in the same function, use a local variable to cache the call, unless two calls are really needed.
-
Do not add wrappers. Kill wrapper functions that just call what we should have called. For example, if you want
Conn.tbls()
, you know where to find it. There is no need to use a wrapper method to call that. -
If you have a global variable, use a global variable. That is: singletons are not objects, but static methods/members. Access them globally from the class, do not pass objects around that are not really objects.
-
Do not pass global variables as arguments. If a program as a global (eg, a server), keep it at a place where all the program can reach it. Do not pass the global to constructors or store it in members.
-
Global variables should have longer and explicit names. But not that long, eg.,
CONFMGR_QERESEND_DFLT
is more than enough as a long name, thus, NEVER write things like:CONFIGURATION_MANAGER_HAPROXY_RESEND_QE_INFO_PERIOD_DEFAULT_VALUE
-
Local variables and method arguments should have smaller, compact, names. For example,
recSts
can be enough, instead ofkvdsRecoveryStatus
, which leads to worse code. -
Loop control variables should be named
i
,j
, etc. as teached in every Programming-101 course around the world. -
If you have a member that has both a get and a set method, or has a get method and is set only at the object creation time, and it is not locked, make the member public and remove getters/setters.
-
When using log4j methods to log diagnostics, apply the previous point and do not call functions in the log call, pass just objects or members to the call, so the log call does not require evaluating function calls unless it knows the logging level is enabled. At that point it will call
toString
for you. This can be used to get rid ofisDebugEnabled
and similar calls. -
Do not put literature in log messages. Describe just what happens in a compact and regular way to permit machine processing of log files and make it easy to read the logs. A good message says something that happen, where did it happen, and the underlying reason. For example:
lxqe: user jim: authentication failed setup: mkdir lib/utils: lib does not exist
-
To make it easy to use objects like transactions and the like in log calls, make their toString method return a simple and compact string for the object, like, for example,
tnx.0x43223
for a transaction with the given TID. This is just an example. Using this, the call can be simply:log.debug("{}: aborting", txn);
-
Do not use mock. If you want to use a server that does something, for testing, write your own and use it for the test;
-
If you want a std server to behave in a particular way for testing, use a TESTING environment variable or something like that and make your server honor it. That is, use real things for testing, so you test a real thing.
-
Gradle tests must be added in a way that excluding tests from the build does actually exclude them from the build process.
-
If you need to define a parameter, e.g., number of connections, put a constant like
MAXCONNS
(or a similar, clean, compact name). Do not use properties for this. If needed, use an environment variable likeLXMAXCONNS
or better, a command line argument for the program involved. The install configuration has parameters and that is where parameters should be.
These are some changes worth noting:
-
ServiceIPs are going. Use NetInfo string names.
-
Addresses use the
Addr
class. -
Sockets se the
Sock
class.
2. Gitlab and packages
The distribution source is hosted on different projects at the gitlab:
-
lxinst: Installer and shared gitlab CI templates
-
git@gitlab.leanxcale.com:lx/lxinst.git
-
-
spread: network groups
-
git@gitlab.leanxcale.com:lx/spread.git
-
-
kv: key-value store
-
git@gitlab.leanxcale.com:lx/Kivi.git
-
uses: spread
-
-
libs: 3rd party java libraries used
-
git@gitlab.leanxcale.com:lx/libs.git
-
-
avatica: locally modified Apache Avatica (query engine JDBC stubs).
-
git@gitlab.leanxcale.com:lx/avatica.git
-
uses: libs
-
-
calcite: locally modified Apache Calcite (query engine).
-
git@gitlab.leanxcale.com:lx/calcite.git
-
uses: libs, avatica
-
-
TM: lxmeta sources and tools.
-
git@gitlab.leanxcale.com:lx/TM.git
-
uses: libs, kv, spread
-
-
qe_calcite: Query engine sources, besides calcite.
-
git@gitlab.leanxcale.com:lx/qe_calcite.git
-
uses: libs, kv, spread, TM, kivi-api, avatica, calcite
-
-
odata: OpenDATA server.
-
git@gitlab.leanxcale.com:lx/odata.git
-
-
prom: prometheus binaries and tools.
-
git@gitlab.leanxcale.com:lx/prom.git
-
-
graf: grafana binaries and tools.
-
git@gitlab.leanxcale.com:lx/graf.git
-
The last three ones might not be installed if not asked for. The last two ones are for stats. They are big packages, beware.
Other packages include drivers for using the system:
-
lxpy: Python driver.
-
git@gitlab.leanxcale.com:lx/lxpy.git
-
-
lxhibernate: Support for Hibernate.
-
git@gitlab.leanxcale.com:lx/lxhibernate.git
-
And, another package includes system tests, and therefore depends on everything.
-
lxtest: System tests/
-
git@gitlab.leanxcale.com:lx/lxtest.git
-
The documentation is kept in this package:
-
Documentation: user guides and lxinst and lxdev guides.
-
git@gitlab.leanxcale.com:lx/Documentation.git
-
3. The distribution
There are three distributions, each one contains the lxinst
program used to
install that distribution, along with
the rest of the distribution:
-
development or unstable Found at https://artifactory.leanxcale.com/artifactory/lxdist. This includes the most recent version for the packages, without any tests used as filters.
-
stable Found at https://artifactory.leanxcale.com/artifactory/lxstable. This is published at midnight but only if tests passed.
-
public Found at https://artifactory.leanxcale.com/artifactory/lxpublic. This is a release distribution, in zip format, using the packages included in the zip as the basis for the install.
In the development and stable distributions,
here is a directory per version, for example v2.0
.
In the public distribution there is a zip file per distribution.
In the directory for a version, there are compressed tar files (always
using a .tgz
extension, and never .tar.gz
).
Each package may supply a single tar file for each one of these:
-
portable files
-
machine dependent files for a given architecture or system
-
sources
For example, these are some file names:
kv.v2.0.Linux.x84_64.tgz kv.v2.0.Darwin.arm64.tgz kv.v2.0.port.tgz kv.v2.0.src.tgz
In the public distribution, there is a zip
file per distribution.
For example, this may be a file name:
lx.2.0.231129.zip
The third number after the version is the date for the release, to identify releases published for bug fixes and the like.
4. Reporting issues
When there is any problem with the system, please, follow these steps:
First, try to gather as much info as possible about your problem:
-
Get the status of the system with
lx status
andlx procs
to see which processes are running and which ones are not running. -
If a process died, or you know a process is not responsive, try to look at its log to see if it is reporting any problem. For example,
lx logs -p lxqe
prints the log for the QEs. -
If the query engine is not responding, or it is failing, it might be because of a problem in kvds or kvms. Thus, try to look also at their logs to see if they are reporting any issue.
-
Should the problem happen when starting the system, the log for lxmeta is the one where you should look to see what that component is doing.
-
If the system is not responsive, try to get the stack for the components with
lx stack
. -
If you got a core dump, try to get the stack dump from it using gdb. For example, if the QE crashed and you got a
core.4324
file, use something likegdb `which java` core.4324 gdb> thread apply all bt
Once you gathered all the possible info, save at least the logs for further inspection, and try to reproduce the issue with the smallest possible test that still reproduces the issue.
For example, reduce the number of table columns, the number of tables, the number of insertions, and so on, to the bare minimum and see if the issue still happens.
In many cases, it is possible to reproduce the issues with just a couple of requests, even if you found them after a 3 hour test run.
With the information at hand about the smallest test that reproduced the problem:
-
Open an issue for it and drop a line to tests@leanxcale.com about the problem.
5. Package conventions
To work on a particular package, it is important to know and follow the conventions used to build the whole sytem.
-
The ./lib directory keeps libraries needed to build the package are kept. Usually these come from other packages needed to build the one at hand (eg., jar files, property files, and dynamic libraries used).
-
The ./bin directory keeps programs needed to build the package, or to run it (eg, spread and kv binaries).
-
The LXBUILD script builds one or more of the following files, once libraries needed have been added to the ./lib directory.
-
A file named like pkg.v2.0.port.tgz (where pkg is the package built) contains portable files, that is, files for any architecture, and unpacks cleanly extracting files only at ./lib/ and ./bin, if any.
-
A file named like pkg.v2.0.Linux.x86_64.tgz (where pkg is the package built, and Linux is the system name, perhaps Darwin, and x86_64 is the architecture name, perhaps amd64) contains machine dependent files, that is, files for the given architecture, and unpacks cleanly extracting files only at ./lib/ and ./bin, if any.
-
A file named like pkg.v2.0.src.tgz contains the package source, and is expected to unpack cleanly under the ./src tree, if possible, or at least in a compatible way for everything else.
Not all these .tgz files must be built by LXBUILD. Only those needed.
Never create more than a single tar file for each architecture (including here port and src as architecture names).
Java packages CANNOT build using central repositories. That is, all depedencies must be previously downloaded at the ./lib directory.
This is important to preserve independence while working on a package, and also to keep things under control when multiple packages are being updated.
6. Working on a package
-
Clone your package
-
Download the .tgz files for the packages you depend on.
-
Extract them at your package root.
-
Have fun
-
Commit your changes and push them.
If work on multiple packages at once, you can just use the LXBUILD script from one of them, after making changes, and then extract the tar files built at the root of the other package you are working on if it relies on those files. For example:
unix$ cd /u/spread unix$ LXVERS=2.0 ./LXBUILD unix$ cd /u/kv unix$ tar zxvf /u/spread/spread.v2.0.Darwin.arm64.tgz
7. Installing and using your own package
In general, if you want to use the whole system to test your package, you can just install the distribution:
unix$ lxinst blade124:~/xamplelx ... install done. Use these to run lx commands: blade124:/ssd/leandata/xamplelx/bin/lx Or copy them to your bin at the host, or add the lx bin to your path: # at blade124, add to your ~/.profile: export PATH=/ssd/leandata/xamplelx/bin:$PATH
Here, all the packages have been downloaded at ./lxdist
, and the
distribution has been installed at blade124:~/xamplelx
.
We can start it as usual, for example:
unix$ export PATH=/ssd/leandata/xamplelx/bin:$PATH unix$ lx start
And the same goes to use any other DB command.
To update your package with a new version, with the DB stopped, you can just go to the installed directory and unpack the tar files for your package:
unix$ cd /ssd/leandata/xamplelx unix$ tar zxf ~/TM/TM.v2.0.port.tgz
It is probably easier to use the addlib
command to copy and unpack
the new package on the installed targets:
unix$ lx addlib ~/TM/TM.v2.0.port.tgz
Yet another use is to use addlib
to add actual library files to the
installed targets.
For example, this achieves a similar effect to the previous command:
unix$ tar ztf ~/TM/TM.v2.0.port.tgz lib/lxmeta-kivi-common.jar lib/lxmeta-logger.jar lib/lxmeta-elastic-mgr.jar lib/lxmeta-console-common.jar lib/lxmeta-console-client.jar lib/lxmeta-CMS.jar lib/lxmeta-commons.jar lib/lxmeta-LIS.jar lib/lxmeta-CS.jar lib/lxmeta-tm-commons.jar lib/lxmeta-tm-integration.jar lib/lxmeta-ltm-kv.jar lib/lxmeta-ltm-commons.jar lib/lxmeta-CM.jar lib/lxmeta-MM.jar lib/lxmeta-SS.jar unix$ rm -rf ./lib/ unix$ tar zxf ~/TM/TM.v2.0.port.tgz unix$ lx addlib lib/*
See the install guide for more.
8. Example LXBUILD and Gradle
The LXBUILD script is given the LXVERS environment variable with the system version (currently v2.0). For example, it is called like in:
unix$ LXVERS=v2.0 LXBUILD
It is imperative that this script either works, or exits with a non-zero status.
To write your own LXBUILD, just copy one and adapt it to your needs.
This is an example LXBUILD file, used by the TM package as of today:
#!/bin/sh PKG=TM if [ -z $JAVA_HOME ] ; then if [ -d /Library/Java/JavaVirtualMachines ] ; then for f in /Library/Java/JavaVirtualMachines/*/Contents/Home ; do if $f/bin/java -version 2>&1 | grep 11.0 >/dev/null ; then export JAVA_HOME=$f export PATH=$JAVA_HOME/bin:$PATH fi done fi if [ -d /usr/lib/jvm ] ; then for f in /usr/lib/jvm/*jdk* ; do if $f/bin/java -version 2>&1 | grep 11.0 >/dev/null ; then export JAVA_HOME=$f export PATH=$JAVA_HOME/bin:$PATH fi done fi fi PATH=$JAVA_HOME/bin:$PATH:. name="master" case x$LXVERS in xv[0-9]*) name=$LXVERS ;; esac LXVERS=$name OS=`uname` ARCH=`uname -m` DISTARCH=$OS.$ARCH export LXVERS DISTARCH TAR=$PKG.$name.port.tgz LIBSTAR=$PKG.$name.libs.tgz SRCTAR=lx$PKG.$name.src.tgz TAR=$PKG.$name.port.tgz gradlew clean || exit 1 gradlew build -x test || exit 1 rm -f lib/*0.0-SNAPSHOT.jar jars=`find . -name 'lxmeta*.jar'` paths="" for j in $jars ; do jname=`basename $j` cp $j lib paths="$paths lib/$jname" done rm -f *.tgz tar zcvf $TAR $paths ls -l $TAR
Here, you might replace this with something else, to build using your preferred build tools:
gradlew clean || exit 1 gradlew build -x test || exit 1
For example, it could be:
make all || exit 1
Also, you must adapt the final part too:
# Just move your stuff under ./lib and ./bin # and run tar zcvf YOURTARFILENAME lib/yourfilesonly bin/yourfilesonly rm -f lib/*0.0-SNAPSHOT.jar jars=`find . -name 'lxmeta*.jar'` paths="" for j in $jars ; do jname=`basename $j` cp $j lib paths="$paths lib/$jname" done rm -f *.tgz tar zcvf $TAR $paths ls -l $TAR
To use the gradle installed on the gitlab runner, instead of downloading it all times,
you can put this early in the gradlew
(gradle wrapper) of your project:
if [ -x /opt/gradle/gradle-7.6/bin/gradle ] ; then PATH=/opt/gradle/gradle-7.6/bin:$PATH exec /opt/gradle/gradle-7.6/bin/gradle $* fi
This uses such gradle when it is found, and continues with the rest of gradlew
otherwise.
9. Gitlab integration
To create updated packages for your package at the repository,
a .gitlab-ci.yml
file should be added.
This file must include just this:
variables: DEPS: "libs spread kv"
include: - project: "lx/lxinst" ref: master file: "/dflt.yml"
stages: - make
In this example, the DEPS variable includes the packages needed to build the one at hand.
The gitlab will download the tar files for them, and unpack them, before running your LXBUILD script and uploading the resulting tars to the distribution.
10. Package tests
Package tests should not depend on anything outside the package.
When using gradle, use gradle test tasks for your tests.
Tests should not be built by default.
11. System tests
Most system tests are kept at the lx/lxtests.git
gitlab project.
This project includes tests written in Java and tools to run system tests
(from any other package or language) and to update releases.
We describe system tests here and how to write system tests in Java in a following section.
System tests are:
-
Tests written in Java in
lxtests
, in aMain
class that can be called from the command line, found in a package atcom.leanxcale.tests
(atsrc/main/java/com/leanxcale/tests
). The test name is the package name. For examplecom.leanxcale.tests.rec1.Main
is the main program for therec1
test. -
Tests written in whatever laguage desired, kept in a per-test directory with a command
LXTEST
found on that directory, to run the test. The directory name is the test name. For examplelxpy/LXTEST
is the main program for thelxpy
test.
A comment at the start of line in the Main.java
or LXTEST
file, with the format:
// lxtest: flags...
or
# lxtest: flags
destribes the test flags. This comment is not needed if there are no flags to describe.
The known test flags are:
-
unix The test can run at unix (eg, on a blade outside of a docker install).
-
docker The test is meant to run on a docker install.
-
mirror The test is meant to run on a mirror install using another host for the mirror.
-
long The test is considered a long test, takes from 1 to 3 minutes.
-
vlong The test is considered a very long test, takes more than a few minutes, and it might be it takes hours.
-
skip The test is not ready and should not be used. (It is preferred to list the test name in the SKIP file instead).
When platforms (unix, mirror, docker) are not given, unix is taken by default.
Otherwise, only the found platforms are used.
A test is meant to run at the underlying host (considered unix
) or at
docker, or on a mirror setup with an external host for the mirror.
By default, tests run only on the unix platform.
For example, this says that the test can run at unix, docker, and that it is a long test:
// lxtest: unix docker long
For unix tests,
the DB is installed at ./testdb
on the project root.
For docker tests, the containers have the usual lx1
host names.
For mirror tests, the mirror host must have exactly the same directory used for the local test execution.
To skip a test, we might use
// lxtest: skip
It is also possible to create a SKIP
file listing tests to skip, one per line, at
the root of the lxtest
directory.
System tests are executed usually using bin/runtests
as shown later.
11.1. Setting up the test environment
The gitlab project depends on all other projects and also on lxinst
.
You are expected to extract at least the lxinst
tar from the distribution so
that its programs and libraries are available for testing, and, to execute
the tests, you have to have the libraries extracted at the conventional ./lib
directory.
To run system tests, clone lxtest
at the testing machine, eg, using a gitlab token
so you can push things too:
TOKEN=USER:TOKEN git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxtest.git history -c
Now, checkout the version to test
cd lxtest git checkout v2.3
And use mkrlse
to download everything.
Do not forget to set your $LXKEY
variable so the distribution can be downloaded:
LXKEY=nemo:APA2Pixxxxx7YuQy history -c mkrlse download
If you plan to compile and publish the JDBC, python, and hibernate drivers, you have to clone these projects too. Otherwise you might skip this:
git clone https://$TOKEN@gitlab.leanxcale.com/lx/qe_calcite.git git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxpy.git git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxhibernate.git history -c
Note the convention is to clone then within lxtest
.
For python tests, you need venv
. It is probably in your system.
Should you need to install inst, you will see a note like the following when building
the lxpy
project, as a reminder:
The virtual environment was not created successfully because ensurepip is not available. On Debian/Ubuntu systems, you need to install the python3-venv package using the following command.
apt install python3.8-venv
To run the ODBC tests, you must install the UNIX ODBC and protocol buffers for C
packages and clone the
project lxodbc
as explained next.
If you do not run those tests, you may skip the following two steps.
To install ODBC and protocol buffer stuff on your UNIX system use comands similar to these ones for Ubuntu:
sudo apt-get update sudo apt-get install libssl-dev sudo apt-get install unixodbc-dev sudo apt-get install libprotobuf-c-dev sudo apt-get install protobuf-c-compiler sudo apt-get install uuid sudo apt-get install uuid-dev
To clone the lxodbc
proyect at the lxtest directory, do this:
git clone https://$TOKEN@gitlab.leanxcale.com/lx/lxodbc history -c
To run ldap tests, just install an example LDAP by executing
sudo bin/ldap/installldap
and then configure it as expected by the tests using
sudo bin/ldap/configldap
After you cloned the extra projects like qe_calcite
and others, within
the project lxtest
, you might use mkrlse
again
to checkout and build them for use on the tests
mkrlse download
If you cloned also qe_calcite
and lxpy
, this will pull changes for them as well
(otherwise only the distribution is downloaded).
Also, this does pull lxtest
itself for updates.
Before running any test, you need a license file, like everybody else. You might copy one to your home directory:
unix$ cp mylicense ~/.
Should you update the source of lxtests
, you can build it by hand to use it:
unix$ LXBUILD ... -rw-rw-r-- 1 leandata leandata 15220291 Nov 7 15:16 lxtest.v2.0.port.tgz unix$ tar zxf lxtest.v2.0.port.tgz
11.2. Running system tests
To list or execute one or multiple tests, use bin/runtests
:
usage: runtests [-h] [-D] [-a] [-m host] [-A] [-F] [-f flag] [-k] [-l] [-n no] [-p plat] [-r] [-s name] [-X testarg] ...
run lxtest tests
positional arguments: name tests starting with
optional arguments: -h, --help show this help message and exit -D enable debug diags -a include long tests -m host include mirror tests using host -A include long and vlong tests -F stop on failures -f flag only tests with flag -k keep crime scene -l list tests (long if repeated) -n no set max test nb -p plat platform (unix, docker, mirror) -r run with relops -s name start at test with name -X testarg pass arg to test
The program operates on all tests found, unless (prefixes of) test names are given, in which case only matching tests are selected.
Tests are sorted by name before being listed or executed.
Flags -l
and -ll
make the program list the tests and exit.
The former lists just test names:
unix$ bin/runtests -l aggnames conflict1 ...
The later is a long listing, printing commands to run each one of the tests found and describing the flags found for them. Refer to the previous section for the known flags.
To list (or execute) tests that have a given flag, use option -f
as in
unix$ bin/runtests -f vlong -ll java -cp 'lib/*' com.leanxcale.tests.rec2.Main #unix long ...
When test name prefixes
are given as arguments, runtests
considers tests with names starting with
any of the arguments given.
For example,
unix$ bin/runtests -ll aggn java -cp 'lib/*' com.leanxcale.tests.aggnames.Main #unix
Or, to long list recovery tests:
unix$ bin/runtests -ll rec java -cp 'lib/*' com.leanxcale.tests.rec1.Main #unix java -cp 'lib/*' com.leanxcale.tests.rec2.Main #unix long ...
When not listing, tests are executed, unless they have the skip
flag.
A test has the skip
flag if the test includes the flag in the lxtest
comment, or
if the test name is listed in the SKIP
file kept at the root of the lxtest
directory.
By default, docker and long tests are excluded from the run.
Flag -a
considers also docker and long tests, not just short ones,
but does not include very long tests.
Flag -A
considers all tests, including very long ones. Beware that such tests
might take 10 hours for a single test in some cases.
For example:
unix$ bin/runtests rec run rec1 unix rec1.out ... pass rec1 test: 45s total: 45s run rec2 unix rec2.out ... pass rec2 test: 1:39 total: 53s run rec3 unix rec3.out ... FAILED rec3 test: 2:10 total: 30s
runs recovery tests (or those with a matching name) that are not long tests and run on unix.
Some tests are numbered, like rec1
, rec2
, etc.
Some of them have many ones, and it may be desired to limit the number of tests
to a given number. Flag -n
may be used to stop running numbered tests at the
given number.
For example,
unix$ bin/runtests -n 2 rec
will not run tests with names rec3
, rec4
, etc.
When a test fails, the failure is reported but following tests continue to run.
Use flag -F
to make no further tests run upon failure.
In this case, after a faiture, the installed DB is kept as-is for
inspection, after killing the processes.
It is possible to prevent tests from installing/starting/formatting the test DB and
use an already started DB at the conventional ./testdb
directory.
To do so, set the LXTESTRUNNING
environment variable.
Flag -k
asks runtests
not to kill the processes when a test fails, to
attach a debugger or inspect them after running the test.
This makes sense only if flag -F
is used too.
Flag -s
makes the program start tests at the given one.
For example, after running
unix$ bin/runtests
or
unix$ bin/runtests rec
If the test rec3
failed and we interrupted the tests, we may run
unix$ bin/runtests -s rec3
or
unix$ bin/runtests -s rec3 rec
to continue with remaining tests.
For debugging, you can set the log properties as shown in the previous section, and you can
also create a log4j.properties
file at the top-level directory, which will be used
and installed on the testing DB.
To attach a debugger to a java process, set the LXQEDEBUG
environment variable or the LXMETADEBUG
environment variable.
In this case, for LXQE, the debug ports will be 5005, 5006, etc.
If the variable value is a component name, eg lxqe100
, then only
that process will have the debug port set.
The same happens for LXMETA, although ports here will be 5050, 5051, etc.
To enable kv tracing, just set the KVDEBUG environment variable.
To enable java debug flags (e.g., those used by Sock.java and other components), set the LXDEBUG environment variable. It works very much like KVDEBUG, but for java processes.
To run tests that use mirror installs, the name of the extra host must be supplied, which enables execution of mirror tests. For example:
bin/runtests -m blade107 ha01
runs the ha01
tests using blade107
as the mirror host for the current host.
12. Writing system tests in Java
For system tests written in Java, use the lx/lxtests.git
gitlab project.
Here we describe the conventions for adding tests, see above to learn
how to actually execute tests.
Create a package in that project at src/main/java/com/leanxcale/tests
for your
tests.
For example, com.leanxcale.tests.rec1
has a recovery test.
At that package, create a Main
class to run
the test with the usual main
method.
No class may be named Main
other than the test in this package.
The program must take flag -k
to keep the processes started and the
installed DB for inspection after the test.
Take a look to com.leanxcale.tests.rec1.Main
as an example.
The test MUST do an exit(0)
if the test passes and exit(1)
if the
test fails (if the program terminates with an exception upon errors, that is fine).
The tools LxTest.passed()
and LxTest.failed()
will take care of this
if you use them.
A main
program might be as follows:
public static void main(String[] args) { Boolean keep = false; if (args != null && args.length > 0 && "-k".equals(args[0])) { keep = true; } try { // use this only if you want to set debug for java LxTest.log4j = new HashMap<>() {{ put("lxlogger", "DEBUG"); put("lxmeta", "DEBUG"); }}; // use this if you want to set certain variables for the run LxTest.configProps = "KVDEBUG\tDOOMPPII"; // and install and run the test LxTest.install(); runTest(); LxTest.passed(keep); } catch(Exception e) { LxTest.failed(keep, e); } }
Here, you can see how the log levels for a few components have been adjusted before installing (and starting) the DB and running the test.
The install, kill, format, and related methods from LxTest
will do nothing if
the environment variable LXTESTRUNNING
is set.
This can be used to run a test on an already running ./testdb
database.
If you want to write several related tests, for example recovery tests,
name their packages starting with a common prefix (eg, rec1
, rec2
, etc.).
If your tests want to share common code, you are free to create a shared package and put that code there (use the prefix for your test packages as the package name) or just put the shared code on the first test and use it at will.
13. Sequencing test events
The kv library includes tools that work with a kvseq
server to permit sequencing
for events for tests that need to coordinate.
Events are single-line strings that are kept forever and are sequenced by the server
receiving them.
The kvseq
server may run at any host and will accept connections from other processes
and convey events to them:
usage: kvseq [-a addr] [-n name]
To start it, run it with the default tcp!localhost!6666
address or supply your own
listen address with flag -a
:
unix$ kvseq &
Once started, programs that want to sequence events (or wait for them) may use these
functions from the kv library (or their kv.Conn
equivalents):
void kvseqinit(char *name, char *addr)
dials the server at addr
and identifes as name
(use something like qe100
).
void kvseqat(char *s)
notifies that the event s
did happen.
void kvseqwait(char *s, int nb)
blocks the caller until the event s
did happen at least nb
times.
int kvseqafter(char *s, int nb)
returns -1 (or false in java) if the event s
did not happen at least nb
times.
Events arrive.name
and leave.name
are automatically generated for clients
as they dial the server and they disconnect from it.
By convention, if $KVSEQ
is set, NetInfo will initialize the client for java processes.
For example, for a particular test, a program might call
kvseqat("created my tree")
and another call
kvseqwait("created my tree", 0)
to wait for the former to create a tree. Or perhaps,
if (kvseqafter("created my tree", 3) >= 0) { xfatal("failing for testing"); }
to cause an abort if the former created its tree three times already.
The kvseq
command may be used as a client by giving the -n
option with the name
to use for the client.
For example:
unix$ kvseq -n cli2 dialed from 127.0.0.1:49510 write events, ?event to wait, @event to check seq> hi seq> there
will produve the events:
arrive.cli2 hi there
Debug flag -O
might be used to make the client report operations (events sent and
received), but the main use of the command is to generate events from the shell.
14. Tests and releases
Releases are made of packages and should be published only when tests pass.
The lxtest
project described before is the basis for running system tests and includes
the mkrlse
command to download, test, and publish releases.
As a convenience, blades 107, 107, and 110 have a testing environment setup as descibed in the previous section about running system tests.
The lxtest
project is cloned at
-
blade105:~/lxtest23
for release v2.3 -
blade107:~/lxtest23
for release v2.3 -
blade110:~/lxtest22
for release v2.2
Plus, there is a crontab entry to run the mkrlse
program
every night:
leandata@blade105:~$ crontab -l 0 0 * * * sh -c 'test -x $HOME/lxtest21/mkrlse && (. $HOME/.bashrc ; cd $HOME/lxtest21 ; ./mkrlse) >$HOME/lxtest21/mkrlse.out 2>&1'
The crontab is a single line, folded here for readability.
The mkrlse
program pulls changes, downloads the distribution, rebuilds the tests,
runs them (excluding very long ones), updates the status at the gitlab, and when tests pass,
publishes the downloaded distribution as the stable one.
The ~/lxtest*
directories are used as a base to run mkrlse
and/or bin/runtests
for the releases of interest.
Usually, they include a clone of the qe_calcite
project and of
the kv
project too, within lxtest
. You can mostly forget about this,
but it might be useful to know it.
The mkrlse
command usage is:
usage ./mkrlse [public|steps...] no args: download build tests newstable public: getstable build newpublic steps: download: download the unstable distribution getstable: download the stable distribution getpublic: download the public release distribution build: extract the tars and builds for the downloaded version tests. tests: run the tests newstable: upload the stable release from ./lxdist/* newpublic: upload the public distribution and publish the drivers
It may be called with a series of steps, like in
mkrlse download build tests newstable
or without arguments (which would do exactly as shown in the above command).
As a convenience, public
is understood to mean
mkrlse getstable build newpublic
which downloads the stable distribution, builds drivers and test code, and uploads the public distribution and its drivers.
When tests
are asked for,
the command uses bin/runtests -a
to run the tests.
It is often handly to call bin/runtests
by hand to run tests or to run only a subset
of existing tests.
In what follows, we use ~/lxtest
as the example directory, but note it might be
~/lxtest21
or some other one depending on the release.
The packages downloaded from the distribution are kept at the ~/lxtest/lxdist
directory.
Thus, to try out a different version of a particular package, we might
leandata@blade105$ cp kv.*.tgz ~/lxtest/lxdist leandata@blade105$ cd ~/lxtest leandata@blade105$ bin/runtests
to see how it behaves.
Note that calling mkrlse
will download packages and overwrite those you have copied
by hand at lxdist
.
When tests pass, mkrlse
pushes a tests passed
commit to the gitlab,
including the runtests.out
file listing the tests executed.
When tests fail, a tests failed
commit is pushed, and the output of the
failed test is included and pushed in a FAILS.out
file.
This also triggers the failure of a pipeline stage on that project, so the
event is notified.
Check any of these files or look at the blade for further inspection on the testing status for the system.
14.1. Updating Releases
After fixing a bug or making a change, you might want to update a release. The unstable distribution for a release is updated just by pushing the new version of the package source to the gitlab.
To test and publish the stable distribution, just run mkrlse
.
Refer to the previous section for options if you want to perform only some of the
tasks (eg, just publish with the already downloaded packages).
leandata@blade105$ cd ~/lxtest # or ~/lxtest23 or whatever leandata@blade105$ mkrlse
The command will publish the stable release when tests pass, or report a failure and push the failure to the gitlab.
You can use mkrlse
to update the public distribution from the one found on the
stable distribution.
leandata@blade105$ mkrlse public
It is advisable to run all the tests (including the very long ones) in this case, before publishing it.
15. Debugging transactions
The lxtest
project includes a command bin/qelogtxns
useful to debug transactions
and conflicts.
usage: qelogtxns [-h] [-f file] [-n] [-v] [-a] [-D] [-b]
check transactions in the lxqe log
options: -h, --help show this help message and exit -f file log file -n dry run: just print txns -v verbose -a print all transactions -D enable debug diags -b checkout lxtest/conflict2 txns
To use it, enable debug diagnostics for both the query engine and kv. For example, in a standard test, you can do this using
LxTest.log4j = new HashMap<>() {{ put("lxqe", "DEBUG"); put("lxjdbc", "DEBUG"); put("lxtxn", "DEBUG"); put("lxcm", "DEBUG"); }}; LxTest.configProps = "KVDEBUG\tDOOI";
In any case, you can
adjust the log4j.properties
file and the KVDEBUG
property in the installed
configuration file with similar values.
It is advisable to run using a single query engine so the whole set of transactions is shown in a single log file.
After running the desired load, use qelogtxns
to perform basic checks and report
the transactions discovered in the log file.
For example:
unix$ bin/qelogtxns -v -f testdb/log/lxqe100.240318.1356.log txn.0x52082: rdonly ses 1 mode session start: sts 335001 testdb/log/lxqe100.240318.1228.log:186 commit: cts 0 testdb/log/lxqe100.240318.1228.log:186 txn.0x733c2: committed ses 2 mode session start: sts 471001 testdb/log/lxqe100.240318.1228.log:568 commit: cts 573002 testdb/log/lxqe100.240318.1228.log:739 Rget txn.0x733c2 db-APP-ACCOUNT cts 0 testdb/log/lxqe100.240318.1228.log:732 tpl t573002 ['1'] ['1000'] Tadd txn.0x733c2 db-APP-ACCOUNT cts 573002 testdb/log/lxqe100.240318.1228.log:762 tpl t573002 ['1'] ['1000'] tpl t573002 ['2'] ['1000'] check key key.0x8f563c8a98d3076d ok testdb/log/lxqe100.240318.1228.log:655 t0 -1 1| 1000 check key key.0xc522be8ac634c7f0 ok testdb/log/lxqe100.240318.1228.log:669 t0 -1 2| 1000 ... checking... checks ok
Here, each committed transaction is reported, along with the STS and CTS values.
The final status for the transaction is reported after its name (e.g., rdonly
or
committed
in this example).
Transactions that did not write and were aborted to avoid the (empty) commit overhead are
reported as rdonly
, and do not have a known CTS.
The line
start: sts 335001 testdb/log/lxqe100.240318.1228.log:186
reports the start and the point in the log file where this did happen.
The line
commit: cts 0 testdb/log/lxqe100.240318.1228.log:186
reports the commit and the point in the log file where this did happen.
Lines like this (or the ones with Rscan
)
Rget txn.0x733c2 db-APP-ACCOUNT cts 0 testdb/log/lxqe100.240318.1228.log:732 tpl t573002 ['1'] ['1000']
report the reads made and the values retrieved, and where in the log did they happen.
Lines like
Tadd txn.0x733c2 db-APP-ACCOUNT cts 573002 testdb/log/lxqe100.240318.1228.log:762 tpl t573002 ['1'] ['1000'] tpl t573002 ['2'] ['1000']
report the writes made, and the CTS used.
Conflicts checked out are shown as in
check key key.0x8f563c8a98d3076d ok testdb/log/lxqe100.240318.1228.log:655 t0 -1 1| 1000
It is also possible to run the tool to show all transactions, including those with (conflicts or) errors. For example:
unix$ bin/qelogtxns -a -f testdb/log/lxqe100.240318.1356.log
In this case, transactions are shown as in
txn.0x1992e2: errors ses 3 mode session start: sts 1675001 testdb/log/lxqe100.240318.1356.log:554580 commit: cts 0 testdb/log/lxqe100.240318.1356.log:554580 Rget txn.0x1992e2 db-APP-ACCOUNT cts 1669002 testdb/log/lxqe100.240318.1356.log:554605 tpl t1669002 ['5'] ['940'] check key key.0x9562fea221d06426 conflict testdb/log/lxqe100.240318.1356.log:554688 t0 - 5| 930
Note the errors
status in the transaction header line, and also the conflict
reported
for a conflict check.
16. Building on RHEL
To build kv and spread for other systems, you might adapt and/or use AWSBUILD
.
This script (found on the lxinst
repository) knows how to create an AWS instance,
download and copying kv and spread for it, and get back the package tar files.
So far, this has been used only for RHEL, beware.
You need the AWS keys, plus the lxinst.pem
file using for access to lxinst
installed hosts.
Beware that the script creates and removes directories for kv
and spread
.
It is better to run it on a directory created just for this.
For example:
unix$ export AWS_ACCESS_KEY_ID=XXXXXX unix$ export AWS_SECRET_ACCESS_KEY=XXXXXX unix$ cp YOUR_LOCATION_FOR_SUCH_FILE ./lxinst.pem unix$ ./AWSBUILD rhel x64_64 see AWSBUILD.log for detailed command output. clone kv... clone spread... instance id i-01a2f42e5be4ebd98 instance ip 67.202.36.229 ***CAUTION: instance might be left running upon failures remove it running: aws ec2 terminate-instances --region us-east-1 --instance-ids i-01a2f42e5be4ebd98
trap: INTR: bad trap install packages at target... copying kv/spread sources... build spread... build kv... copy binaries back... -rw-r--r-- 1 leandata leandata 50317810 Jul 17 15:42 kv.v2.0.Linux.x86_64.tgz -rw-r--r-- 1 leandata leandata 197901 Jul 17 15:42 kv.v2.0.port.tgz -rw-r--r-- 1 leandata leandata 9257326 Jul 17 15:42 kv.v2.0.src.tgz -rw-r--r-- 1 leandata leandata 1856070 Jul 17 15:42 spread.v2.0.Linux.x86_64.tgz removing instance i-01a2f42e5be4ebd98 ...
17. AWS and DNS at aws.leanxcale.com
The domain aws.leanxcale.com
is delegated from the leanxcale DNS to the
Amazon Route 53 zone of the same name.
All installs using lxinst
for AWS using a tag, set the tag to tag.aws.leanxcale.com
where tag
is the tag given as the value for the awstag
config property.
The gitlab project lx/awsdns.git
contains a tool, mklambda
that
creates event watchers and a lambda function to update A records in the DNS
for instances that become running
.
To create the lambda at a region:
mklambda us-west-2
To remove the lambda at a region:
mklambda -d us-west-2
To create the lambda at all regions:
mklambda
To remove the lambda from all regions:
mklambda -d
If the lambda was already created at a region, it is not updated, and nothing is done.
18. Gitlab runners
For the gitlab, we use the drunner6
runner.
The gitlab source for the runner is at
git@gitlab.leanxcale.com:runners/drunner6.git
Just pull and add/remove things to the docker file.
Then, to update the runner:
image=drunner6 docker login registry.leanxcale.com/$image docker build . docker build --security-opt seccomp=unconfined . id=XXX whatever id for the image docker printed XXX docker tag $id registry.leanxcale.com/$image docker push registry.leanxcale.com/$image
19. Documentation
Refer to the
git@gitlab.leanxcale.com:lx/Documentation.git
project. The README file there describes what you need to install to update the docs and how to generate them from the source.
The docs are generated whenever you push a new version to the v2.0
branch
or any other release branch.
To make changes on the docs,
-
checkout your branch (eg.,
v2.0
)git checkout v2.0
-
edit the files you want
-
look at your changes
git status
-
add the changed files to your commit
git add file.adoc
-
or pehaps
git add . # adds everything changed under the current directory
-
commit your changes
git commit -m 'short message describing the changes'
-
pull changes made by others
git pull -v
If there are conflicts, resolve them by editing by hand the conflicting files and then
using git add
on them to resolve the conflicts, and git commit
to finish the merge.
-
push your changes for others
git push -v
-
If your commit number is 0dedf50b, and you want to apply the same set of changes to, say, branch v2.1, then use
git cherry-pick
to do so.git checkout v2.1 git cherry-pick 0dedf50b git push
-
do not forget to get back to your working branch
git checkout v2.0
-
To get a clue regarding the commits pushed, go to the gitlab project and use the
Code
/Repository Graph
menu to get the tree of commits.
20. Moving to a New Release
Remember that, by convention, the last release number where we are working on is the current development release. This is so even if we decided to publish a public distribution for it.
To start a new release (e.g., v2.1
), start with the current one (e.g., v2.0
).
Make sure nobody is changing it, and it passed the tests, and you either pulled
the last changes for the packages involved or cloned at the last version.
First, clone the lxinst
package from the git and create the new branch.
unix$ git clone git@gitlab.leanxcale.com:lx/lxinst.git unix$ cd lxinst unix$ git checkout v2.0 unix$ git checkout -b v2.1
Then, edit the installer so it will fetch packages from the new version.
Just change src/lxdefs
and update the LXVERS
and LXREPO
definitions early
in the file. For example,
LXVERS='v2.1' LXRLSE="unstable" LXREPO="https://artifactory.leanxcale.com:443/artifactory/lxdist/v2.1/"
Run LXBUILD to be sure it is ok and then push the new version.
unix$ LXBUILD unix$ git push
The scripts LXBUILD
at this and every other package will use the git branch
as the version name, but, for safety, it may be a good idea to edit the default
version number they use as well.
Now, go through the release packages and, for each one, create a branch with the new version, for example:
unix$ git checkout v2.0 unix$ git checkout -b v2.1 unix$ git push
It is a good idea to do this in order, using the package list shown early in this document.
You should also go to blade105
or whereever you are running daily tests, and
update the setup to test daily the new development release.
One way is to update the existing setup to run the new release. You might also copy it to a different directory to keep the testing setup for the old one.
For rexample, rename the directory for the new release, and fetch the new branches
cd $HOME mv lxtest20 lxtest21 cd lxtest21 git fetch && git checkout v2.1 cd qe_calcite ; git fetch && git checkout v2.1 cd .. cd lxpy ; git fetch && git checkout v2.1 cd .. cd kv ; git fetch && git checkout v2.1
Then edit the crontab to run the new release tests
crontab -e ... and change lxtest20 to lxtest21 ...
The unstable distribution will be published directly by the pushes.
To generate the stable distribution, go to the blade running the lxtest
program mkrlse
and run it.
Do this in the branch for the new version, of course.
That will run the tests, and, when they pass, publish the new stable distribution for the new version.
Once the new version is ok for a new public distribution, use
mkrlse release
as usual.
21. Hunting leaks
To hunt for leaks, use a running system.
For Java processes, locate the process pid, eg.
unix$ jps 683466 LXMeta 683448 LXQE 672108 GradleDaemon 687379 Jps
Then, extract a heap histogram:
unix$ jmap -histo:live 683448 | sed 50q > map1.out
Here, 50 lines of output suffice unless you want to check out everything. After a while, giving time to the GC to enter and reclaim unused space, extract a new one:
unix$ jmap -histo:live 683448 | sed 50q > map2.out
Then, you can use the jleak
tool as found on the bin
directory of the lxtest
project,
to inspect both ones and report the types for which total allocation
did grow sorted by increase of size in bytes.
unix$ jleak map1.out map2.out java.lang.invoke.DirectMethodHandle$Accessor: 349 objs 13960 bytes java.lang.invoke.BoundMethodHandle$Species_L: 424 objs 13568 bytes java.util.ArrayList: 564 objs 13536 bytes java.util.zip.Inflater: 209 objs 13376 bytes
It is also useful to produce a full heap dump
unix$ jmap -dump:format=b,file=heap.dump 683448
And to inspect the resulting file with something like the Eclipse Memory Analyzer, which includes an automated report for suggested leaks.
Here, we got 564 extra ArrayLists with their total size increased by 13536 bytes. You’ll have to search for them in a heap dump if that is a suspected leak.
Now, for C processes, and for the C part of the QE process, you can use memleax
.
It can be installed as in
unix$ git clone https://github.com/WuBingzheng/memleax.git unix$ cd memleax # these are needed unless the packages have been installed unix$ sudo apt install cmake unix$ sudo apt-get install libunwind-dev unix$ sudo apt-get install libelf-dev unix$ sudo apt-get install libdwarf-dev unix$ sudo apt-get install libdw-dev unix$ mkdir build unix$ cd build unix$ cmake .. unix$ make unix$ sudo make install
And, once available, you can run it for a running pid. It will intercept calls to memory allocation/deallocation and, those that survive the given interval in seconds will be reported.
For example
unix$ memleax -e 60 683448