Data Definition Language (DDL) Syntax

The SQL Data Definition Language (DDL) provides users with the capability to create tables, indexes, sequences, views, online aggregates, as well as databases, users, roles, and permissions. These are the supported statements:

ddlStatement:
	| createDatabaseStatement
	| dropDatabaseStatement
	| createSchemaStatement
	| dropSchemaStatement
	| createTableStatement
	| dropTableStatement
	| truncateTableStatement
	| alterTableStatement
	| createIndexStatement
	| dropIndexStatement
	| createViewStatement
	| dropViewStatement
	| createSequenceStatement
	| dropSequenceStatement
	| createOnlineAggregate
	| dropOnlineAggregate
	| createUserStatement
	| dropUserStatement
	| grantStatement
	| revokeStatement
	| createRoleStatement
	| dropRoleStatement
	| createTriggerStatement
	| dropTriggerStatement
	| enableTriggerStatement
	| disableTriggerStatement

1. Predefined Database, Schema, Admin User, Default Schemas

LeanXcale has a predefined database that is DB. LeanXcale has a predefined schema APP in all databases. The user LXADMIN is administrator of all schemas of all databases in the system. Each user has a schema associated that has the same name as the user. The default schema for LXADMIN is APP. The default schema for all other users is their own schema. When database objects are created without qualifying the name with the schema they are created in the default schema of the user.

2. CREATE DATABASE and DROP DATABASE Statements

The CREATE DATABASE allows to create a new database:

createDatabaseStatement=
	CREATE DATABASE databaseName [USER userName IDENTIFIED BY 'password']*

The user LXADMIN is administrator of all databases in the system. If the optional USER clause is provided, then a user userName is created within the new database and it becomes the owner of the database, that is, it has all permissions over the newly created database. Note that a connection is over a particular database. So after creating a database to be able to access it, it is needed to reconnect to the system with the new database.

These are two examples of use:

CREATE DATABASE MYDB
CREATE DATABASE MYDB2 USER USR IDENTIFIED BY 'foobar'

The first example creates the database without specifying the owner, so the owner by default will be LXADMIN. The second example creates the database with user USR as owner. This user is created as part of the database creation and made the owner of the database. Remember that owners are local to each database, so USR is a user only on MYDB2 database.

The DROP DATABASE enables to remove a database:

dropDatabaseStatement=
	DROP DATABASE databaseName

An example removing one of the databases we created before is:

DROP DATABASE MYDB

3. CREATE SCHEMA AUTHORIZATION Statement

The CREATE SCHEMA AUTHORIZATION enables to perform a batch of creation of tables and views, and grant permissions on a given schema:

createSchemaStatement=
	CREATE SCHEMA AUTHORIZATION schemaName
		createAndGrantStatement [createAndGrantStatement]*

createAndGrantStatement=
	createTableStatement
	| createViewStatement
	| grantStatement

Note that the schema should exist, since the statement only performs the specified batch of creates and grants over the existing schema. If one of the create statements fails, it will not continue processing the rest.

4. CREATE TABLE Statement

The CREATE TABLE command allows the creation of a table and follows the specified syntax as outlined below:

createTableStatement=
	CREATE [CACHE] [DELTA] TABLE
	[ IF NOT EXISTS ] name
	[ NO OLD VERSIONS ]
	[ '(' tableElement [, tableElement ]* ')' ]
	[ AS queryStatement ] [ partitionBySpec [partitionBySpec]*  [ distributeBySpec  ]* ]

4.1. Table Types

The purpose of table creation is to define the structure of the table, including column names, attributes, and data types. There are three types of tables: regular, cache, and delta.

Regular tables are created using CREATE TABLE. Cache tables (CREATE CACHE TABLE) are always stored in memory, in addition to disk, to ensure rapid access. Delta tables (CREATE DELTA TABLE) include delta columns, allowing the accumulation of values with conflict-free updates, such as a sum.

4.2. IF NOT EXISTS Clause

Attempting to create a table that already exists results in an error. To avoid this error, the IF NOT EXISTS clause can be used, ensuring that the table is only created if it does not already exist.

4.3. Old Versions

LeanXcale as most modern databases, employs Multi-Version Concurrency Control (MVCC). Under MVCC, updates are not executed in place; instead, a fresh version of the tuple is generated with each update. This mechanism enables efficient concurrency control by avoiding read-write conflicts.

In specific scenarios, when it is known that the application exclusively performs either data insertion or data reading, but not both concurrently, it is possible to disable multi-versioning. This action is undertaken to remove the associated overhead and increase the efficiency of insert and update operations within the LeanXcale database.

4.4. Column Names, Types, Constraints, Online Aggregates

The list of tableElements is where it is specified the column names and any attributes associated to them. A tableElement can be either a column specification or a constraint specification as can be seen in the following syntax:

tableElement=
	tableColumn
	| [ CONSTRAINT name ] tableConstraint

A tableColumn has the following syntax:

tableColumn=
	columnName [kindOfDeltaColumn] type [ NOT NULL ] [ columnGenerator ] [ PRIMARY KEY ]

In its most fundamental form, a column specification consists of a column name followed by its data type. The inclusion of the modifier NOT NULL imposes a constraint, indicating that the column cannot contain NULL values. Additionally, the PRIMARY KEY modifier designates that the column is a part of the primary key for the table.

Columns can also be automatically generated in three distinct ways. The first method involves providing a default value to be used when no explicit value is provided. The second method employs an auto-incrementing value, achieved through the use of the AS IDENTITY clause. The third method utilizes a delta column, wherein updates and inserts are treated as an accumulation function over the current value of the column. The first two ways are indicated in the columnGenerator syntax:

columnGenerator:
	DEFAULT expression
	| [ GENERATED ALWAYS ] AS IDENTITY
	[ START WITH initialValue ]
	[ INCREMENT BY incrementValue ]

The AS IDENTITY clause indicates that is auto-increment key. It supports also the form GENERATED ALWAYS AS IDENTITY for compatibility with the PosgreSQL dialect. The START WITH subclause allows to specify the first value to be given to the auto-increment column. The INCREMENT BY subclause enables to indicate the value by which to increment the auto-increment column.

The third way is specified with the kindOfDeltaColumn syntax:

kindOfDeltaColumn =
	( SUM | MIN | MAX | COUNT )

The specification of an accumulation function determines the manner in which results are aggregate. For instance, when utilizing the SUM function, each update entails updating the SUM column with the previous value of the column augmented by the value being updated. MIN and MAX functions operate by obtaining the minimum or maximum, respectively, over the current value of the column and the new value introduced in the update. In the case of COUNT, it increments the previous value of the column by one, irrespective of the updated column value.

Additionally, various constraints can be specified over the columns. This is the constraint syntax:

tableConstraint:
	PRIMARY KEY '(' columnName [, columnName ]* ')'
	| PRIMARY GEOHASH KEY '(' latitudeColumnName , longitudeColumnName ')'
	| PRIMARY GEOHASH KEY '(' geoColumnName ')'
	| FOREIGN KEY '(' columnName [, columnName ]* ')'
	| REFERENCES tableName '(' columnName [, columnName ]* ')'
	| UNIQUE '(' columnName [, columnName ]* ')'
	| GEOHASH '(' latitudeColumnName , longitudeColumnName ')'
	| GEOHASH '(' geoColumnName ')'
	| CHECK expression

The PRIMARY KEY constraint defines the columns that constitute the primary key of a table and establishes the order in which they are arranged.

The PRIMARY GEOHASH KEY constraint defines a geohash primary key. It can be defined in two ways. The first way is providing two FLOAT or DOUBLE columns with the latitude and longitude. The second way is providing a geo column that is a STRING column using WKT markup language representing a vector geometry object. The GEOHASH constraint generates a hidden STRING column with the geohash of the indicated column(s). This hidden column is the one used as primary key.

The FOREIGN KEY constraint indicates that one or more columns within a table refer to the primary key in another table. Following the FOREIGN KEY keyword, the specific columns in the current table are listed. Subsequently, the REFERENCES keyword is used to identify the table name of the referenced table, along with the columns serving as primary keys in the referenced table.

The GEOHASH constraint indicates what columns to use to generate a hidden STRING geohash column that is used to generate a secondary index over that column.รง This geohash secondary index enables to improve geo searches such as a point is contained in a geometric shape. A geohash column can also be used as part of a composed primary key, for instance, PRIMARY KEY (country, geoHashCol). In the example, one can do searches with country and a geo condition, and thus, a search will find the first row of a country, and then perform a geo search within the country.

The UNIQUE constraint mandates that the specified column or set of columns consistently maintain distinct values across various rows in the table.

The CHECK constraint ensures that the stipulated condition is met for any values inserted into the associated column.

4.5. AS Clause

The AS clause is used to define the table schema based on the result set of a query statement. The table is then populated with the rows from the result set. When you specify AS clause, you may omit the list of tableElement_s since they will be taken from the resultSet, or you can provide the columnName and omit the data type from any _tableElement, in which case it renames the underlying column.

4.6. PARTITION BY Clause

The PARTITION BY clause plays a pivotal role in determining the fragmentation strategy of a table within LeanXcale, a critical factor for optimizing operational efficiency. If no partitioning is specified for a table, it means all the workload of the table will be handled by a single storage server and therefore, at most one core will be used to process that workload. If partitioning is specified, one can also indicate the criteria to distribute the partitions. The PARTITION BY clause has a crucial role in defining the fragmentation strategy employed by a table, representing a critical element for optimizing operational efficiency. When partitioning is explicitly specified, it becomes possible to define the criteria to distribute the partitions. The PARTITION BY clause syntax is as follows:

partitionByClause =
PARTITION BY
	KEY `(` columnName [, columnName]* `)` AT  tupleLiteral [, tupleLiteral]*
	| DIMENSION  columnName
		AT  literal [ , literal ]*
		| EVERY intervalLiteral
			[KEEP intervalLiteral]
		| EVERY unsignedIntegerLiteral
			[KEEP unsignedIntegerLiteral]
	| HASH `(` columnName [, columnName ]* `)`

tupleLiteral=
	`(` literal [ , literal ]* `)`

LeanXcale supports several methods for table partitioning that can be used in isolation or in a combined way:

  • Prefix of Primary Key: The table can be partitioned based on a prefix of its primary key.

  • Dimension-based Partitioning: Partitioning can be performed using a specified dimension, typically another column in the table. Each PARTITION BY DIMENSION clause only has a single column. Each clause adds a new dimension to the partitioning. The column should be part of the primary key. The columns of the primary key that are dimensions used for partitioning should be a suffix of the primary key (never including the first column of the primary, since this can be achieved with PARTITION BY KEY).

  • Auto-partitioning on Dimension Columns of temporal or auto-increment nature.

  • Hash partitioning: It can be specified over one or more columns of the primary key. It computes a hash over the specified columns and the value is stored in a new hidden column named HASHID. This column becomes the last column of the table and the last column of the primary key. There are as many partitions as storage servers.

Auto-partitioning can be applied to dimension columns of type TIMESTAMP, TIME, DATE, or an auto-increment integer. This automatic partitioning is particularly crucial for historical tables where data spans over time, such as sales or time series. As tables, especially historical ones, grow in size, accessing them becomes increasingly slower. Auto-partitioning based on temporal or auto-increment fields ensures that table fragments remain manageable in size, optimizing access efficiency. In the context of historical tables, the field keeping the temporal column or auto-increment counter is the one that can be used for auto-partitioning. Since in historic tables, it is the last fragment the one being consistently accessed, while the other fragments are infrequently accessed, auto-partitioning ensures that only the actively used fragments consume resources, mitigating inefficiencies associated with accessing large historical datasets.

When using auto-partitioning with TIMESTAMP the time span should be specified with INTERVAL. Some examples are:

  • INTERVAL '1' YEAR.

  • INTERVAL '1' YEAR TO '6' MONTH.

  • INTERVAL '6' MONTH.

  • INTERVAL '1' MONTH.

  • INTERVAL '90' DAY.

  • INTERVAL '7' DAY.

  • INTERVAL '1' DAY.

  • INTERVAL '12' HOUR.

  • INTERVAL '90' SECONDS.

  • INTERVAL '15' SECONDS.

The utilization of the PARTITION BY KEY AT clause offers the capability to partition data based on a prefix of the primary key (including the whole key),

On the other hand, the PARTITION BY DIMENSION clause allows partitioning using any other column designated as a dimension in the CREATE TABLE statement, e.g., 'state'. This feature ensures that all rows with a specified range of values in the chosen dimension are grouped into the same partition.

Within the AT subclause, specific split points for partitioning are provided. In contrast, the EVERY subclause triggers automatic partitioning at specified intervals, streamlining the process of creating partitions based on the defined dimension.

4.7. DISTRIBUTE BY Clause

If partitions have been specified, then it is possible to indicate how to distribute the partitions across the storage servers with the distribute by clause. It has the following syntax:

distributeBySpec=
	NO DISTRIBUTE
	| DISTRIBUTE BY HASH
	| DISTRIBUTE BY KEY
	| DISTRIBUTE BY DIMENSION dim

The distribution of data can be achieved through one several criteria in LeanXcale:

  • NO DISTRIBUTE. It does not distribute the table, so it will be stored at a single storage server that will handle all the workload for that table.

  • DISTRIBUTE BY KEY. Utilizes the primary key partitions to distribute across storage servers. Cannot be combined with other DISTRIBUTE BY clauses.

  • DISTRIBUTE BY DIMENSION. Uses the specified partitioned dimension column to distribute.

  • DISTRIBUTE BY HASH. Uses the specified hashing to distribute. Cannot be combined with other DISTRIBUTE BY clauses.

  • Default: If no DISTRIBUTE clause is specified, automatic distribution is used, i.e., the system decides how to distribute the partitions.

A new column, HASHID, will be appended to the primary key that will be the last column of the primary key. For insert operations, the column HASHID should be filled with 0 and the database will fill it with the right value. It will create as many partitions as the number of storage engines in the database. It should be noted that DISTRIBUTE BY HASH cannot be combined with other distribution methods. If HASH partitioning has been specified no other distribution criteria is allowed.

The HASH partitioning and distribution method is straightforward to specify as it doesn’t require knowledge of data distribution on column ranges. However, it comes with a significant tradeoff - reads become significantly more expensive since they must be performed across all partitions of the table. There is an overhead associated to compute the hash, so only add the columns really necessary for an even partitioning. Also using less columns than necessary will result in an skewed partitioning.

When using DISTRIBUTE BY KEY, it should be noted that the partitions made by primary key will be distributed among the storage servers. If there are less primary key partitions than storage servers, then only a subset of the storage servers will be used for the table. It is recommended to have as many partitions as storage servers or a multiple of them.

When using DISTRIBUTE BY DIMENSION, it happens the same as with DISTRIBUTE BY KEY. Only the partitions that have been defined will be distributed among the storage servers. If there less partitions than storage servers, only a subset of them will be used. Again it is recommended that the number of partitions used for distribution are the same as storage servers.

If NO DISTRIBUTION is indicated, it means that all the table fragments will remain on a single storage server. It should be noted that even with several storage servers, the common case, one can distribute tables across storage servers, so no distribution is need on a per table basis. Having said so, in most cases the most effective way to distribute the workload is distributing all tables across all storage servers, this is especially true for large tables over which large queries are performed and tables with high loads of any other kind, either ingestion or queries.

If partitioning is specified but no explicit distribution method is defined, automatic distribution will be performed based on the following criteria: If hash partitioning is defined it will be used as distribute criterion. Otherwise, it distributes the partitions across storage servers.

For more in-depth details, refer to the section on writing efficiently in LeanXcale.

5. DROP TABLE and TRUNCATE TABLE Statements

The DROP TABLE statement facilitates the deletion of a specified table in SQL. It includes an optional IF EXISTS clause, which, when utilized, prevents the statement from generating an error if the table does not exist. The syntax is:

dropTableStatement=
	DROP TABLE [ IF EXISTS ] name

The TRUNCATE TABLE statement in SQL is used to quickly and efficiently remove all rows from a table. Unlike the DELETE statement, which removes rows one by one and generates individual row delete operations, TRUNCATE TABLE is a more direct and faster operation for removing all data from a table. It is syntax is:

truncateTableStatement=
	TRUNCATE TABLE name

6. ALTER TABLE Statement

The ALTER TABLE statement is used to modify the structure of an existing table. It allows you to add, modify, or drop columns, constraints, indexes, or perform other changes to the table definition. Its syntax is:

alterTableStatement=
      ALTER TABLE name alterTableAction

alterTableAction=
	RENAME TO newName
	| DROP COLUMN columnName
	| DROP constraintName
	| ADD COLUMN columnName columnType
	| ADD [ CONSTRAINT name ] tableConstraint
	| ADD PARTITION
		(
		KEY `(` columnName [, columnName]* `)` AT tupleLiteral [, tupleLiteral]*
		| DIMENSION columnName AT literal [, literal]*
		)

7. CREATE ONLINE AGGREGATE and DROP ONLINE AGGREGATE Statements

Online aggregates is a novel feature that LeanXcale provides. It can be thought as a materialized aggregation view with very efficient real-time update (it is updated as part of the transaction modifying the parent table). Basically, an online aggregate is a materialized aggregation query that is updated as part of the transaction updating the parent table, i.e., the table from which is derived. The syntax is as follows:

createOnlineAggregate:
	CREATE [DURABLE] ONLINE AGGREGATE
		[ IF NOT EXISTS ] [ NO OLD VERSIONS] aggregateName
		AS { * | projectItem [, projectItem ]* }
	FROM tableExpression
		[WHERE booleanExpression]
			[ GROUP BY { * | projectItem [, projectItem ]* } ]
				[ ( AUTOSPLIT columnName splitperiod [ AUTOREMOVE AFTER persistperiod ]
					| SPLIT columnName EVERY numericOrIntervalExpression [ AUTOREMOVE AFTER numericOrIntervalExpression ]

The optional "DURABLE" parameter serves to determine the persistence behavior of the online aggregate in relation to the parent table. When specified, the "DURABLE" attribute ensures that the online aggregate remains intact even if the parent table undergoes deletion. In contrast, when "DURABLE" is not explicitly indicated, the deletion of the parent table triggers a cascade deletion, leading to the removal of the associated online aggregate. The "IF NOT EXISTS" clause, analogous to its usage in the CREATE TABLE statement, does not result in an error if the online aggregate table already exists. This provision ensures that the creation process proceeds smoothly without interruption in the presence of a pre-existing online aggregate table. The "NO OLD VERSIONS" clause, akin to its counterpart in the CREATE TABLE statement, serves to deactivate multi-versioning. Consequently, only the latest version of each aggregate is retained, foregoing the retrieval of versions corresponding to transaction snapshots. The "AS" clause facilitates the specification of aggregates that the online aggregate will keep updated (such as SUMs, COUNTs, MINs, MAXs) and designates the columns from the parent table over which these aggregates are computed. Online aggregates, derived from a designated parent table indicated in the FROM clause, can be further filtered using the optional WHERE clause. This clause allows for setting a boolean condition, determining which rows from the parent table are considered by the online aggregate. The optional GROUP BY clause provides a means to specify the grouping criteria for computing the aggregates. If left unspecified, the aggregation encompasses all rows that satisfy the filter condition (if specified) or all rows if no filter condition is provided. Given that online aggregates are commonly defined over historical tables, using auto-splitting becomes imperative. The AUTOSPLIT clause may be specified over a temporal column or an auto-increment integer column. The optional AUTOREMOVE clause offers the ability to specify how data should be pruned after a certain period. It is should be noted that the intervals specified in "AUTOSPLIT" and "AUTOREMOVE" should align with those of the parent table or be a multiple thereof.

An online aggregate can be deleted with DROP ONLINE AGGREGATE:

dropOnlineAggregate=
	DROP ONLINE AGGREGATE onlineaggregateName

8. CREATE INDEX and DROP INDEX Statements

Secondary indexes play a pivotal role in optimizing access to columns that do not constitute a prefix of the primary key. The creation of these indexes is accomplished through the utilization of the CREATE INDEX statement, adhering to the subsequent syntax:

createIndexStatement=
CREATE [UNIQUE] INDEX [ IF NOT EXISTS ] indexName
	ON tableName '(' columnName [, columnName ]* ')'
		[INCLUDE '(' columnName [, columnName ]* ')']
		[GLOBAL [ partitionBySpec [ distributeBySpec ] ]

The optional "UNIQUE" clause serves to specify that only a singular value can exist for each combination of columns within the index. This uniqueness constraint ensures that the indexed columns collectively have distinct values.

The index is delineated over a specific table, as denoted by the ON clause. Following the specification of the tableName, a sequence of columns constituting the secondary key is enumerated within parentheses. These columns define the structure of the secondary index, facilitating efficient retrieval of data based on the specified column combinations.

The INCLUDE clause allows to defined covered indexes, that is, indexes that include additional columns. This is useful for optimizing queries over the secondary index. Basically, it enable to recover the value of the columns without having to perform a search on the primary key.

The GLOBAL clause indicates that the index will be global. Without the GLOBAL clause, indexes are local to each table fragment. That is, each table fragment has a fragment for the secondary index that only contains secondary keys for the tuples stored in the associated fragment of the primary table. Secondary indexes are convenient because they enable to read the tuple within the same storage server. However, there is an involved tradeoff. When searching with a secondary key, all storage servers containing fragment of the table are sent the query, with the consequent overhead. Instead, global indexes are like a table and they can be stored on any storage server. In fact, one can specify the partitioning and distribution strategy as for regular tables. Searches in global indexes only impact the storage servers containing a fragment that has secondary keys in the searched range. So only the storage servers containing relevant information are contacted. However, after recovering the primary keys of the tuples satisfying the query, they have to be searched on the storage servers containing them. The best option is to combine the global index with and INCLUDE clause including all the columns that can be used by the query. In this way only the storage servers containing secondary keys in the searched interval are contacted and the query is solved without a second search step. Basically global covered indexes (GLOBAL plus INCLUDE) as vertical partitions in columnar data warehouses.

To eliminate an index, the DROP INDEX statement is employed, using the following syntax:

dropIndexStatement=
	DROP INDEX [ IF EXISTS ] indexName

The statement begins with the DROP INDEX keywords, followed by an optional "IF EXISTS" clause, which ensures that the operation does not result in an error if the specified index does not exist. The indexName parameter designates the identifier of the index intended for removal. This process effectively dismantles the specified index, revoking its association with the corresponding table and freeing up resources.

9. CREATE VIEW and DROP VIEW Statements

A view is a virtual table derived from the result of a SELECT query. Unlike a physical table, a view does not store the data itself but provides a way to represent the result of a query as if it were a table. Views are useful for simplifying complex queries, abstracting the underlying data model, and controlling access to specific columns or rows of a table. It has the following syntax:

createViewStatement=
	CREATE VIEW [ IF NOT EXISTS ] viewName AS query

That creates a view named "viewName" from the result of the query provided in the AS clause.

A view can be removed by the DROP VIEW statement:

dropViewStatement:
      DROP VIEW [ IF EXISTS ] viewName

10. Sequences

A sequence is an object used to generate unique numeric values in a specified order. Sequences are commonly used to produce primary key values for tables, ensuring that each new record has a unique identifier. Unlike auto-increment columns, sequences provide greater flexibility and can be shared among multiple tables. Sequences use a BIGINT type (long integer). They are created by means of the CREATE SEQUENCE statement:

createSequenceStatement=
	CREATE SEQUENCE [ IF NOT EXISTS ] sequenceName sequenceModifier*

sequenceModifier=
	AS type
		|   START WITH initialValue
		|   INCREMENT BY incrementValue
		|   CACHE cacheSize
		|	NOFETCH

It should be noted that LeanXcale is a distributed database and might have multiple query engines and have multiple storage servers. A particular sequence will be handled by one storage server. The storage server writes each sequence value assigned to persistent storage. When a sequence is used intensely, one can regulate this behavior assigning a value to the CACHE modifier that will have the effect of writing persistently one out of CACHE times increasing significantly the efficiency. It should be noted in case of a crash of that storage server the sequence might have gaps in the following values after the recovery.

The client application interacts with a query engine. The query engine consumes a sequence value by invoking storage server in charge of that sequence. Since this interaction implies some overhead, the CACHE modifier can be used to make the query engine(s) fetch a number of values from the sequence to consume them locally. This means that in a multi-node configuration with multiple query engines each query engine will fetch a number of values. Thus, values of the sequence will not be necessarily consecutive along time. Thus, the CACHE modifier also affects the number of fetched values by each query engine. It is possible to prevent fetching to guarantee that values of the sequence are consecutive (that will be less efficient) by specifying the NOFETCH modifier.

The sequenceModifiers specify the behavior of the sequence. The START WITH clause indicates the first value to be given in the sequence. The INCREMENT BY clause sets the step size by which the sequence increases or decreases. The CACHE clause makes that query engine instance fetches a stripe of cacheSize length to assign sequence identifiers. In this way, the query engine does not have to request to a storage engine the next value of the sequence every time. Additionally, the storage server handling the sequence will persist the value of the sequence every CACHE times instead of every time. This is the preferred way of using sequences, with the CACHE option. The NOFETCH modifier disables the fetching in the query engine and guarantees that identifiers are consecutive (it has a negative performance impact).

One can consume the next sequence value by means of:

    next value for sequenceName

For instance, for a sequence called mySeq:

   select next value for mySeq;

will consume the next value of the sequence mySeq.

It is also possible to check the current sequence value (the last value consumed) of a sequence without fetching (with NOFETCH modifier) by:

   current value for seqName

For the previous example of mySeq sequence the current sequence value can obtained with:

   select current value for mySeq

Note that with fetching the concept of current value is fuzzy since each query engine will have a number of values of the sequence fetched and there is no notion of current sequence value.

A sequence can be removed with the DROP SEQUENCE statement:

   dropSequenceStatement:
   DROP SEQUENCE [ IF EXISTS ] sequenceName

11. CREATE USER, ALTER USER, and DROP USER Statements

New users can be created by means of the CREATE USER statement:

createUserStatement=
	CREATE USER userName IDENTIFIED BY 'password'

Leanxcale has two predefine users: LXADMIN and NONE. Users are case sensitive. When the are written without quotes they are always stored as capitalized. When they are written between quotes the capitalization within the quotes is used. Note that in the connection string of the drivers (JDBC, Python, ODBC) the user names are always case sensitive.

The password of a user can be changed with:

alterUserStatement=
	ALTER USER userName IDENTIFIED BY 'password'

A user can be removed by:

dropUserStatement=
	DROP USER userName

12. GRANT and REVOKE Statements

Permissions can be granted by:

grantStatement=
	GRANT permissionList
		[ON (TABLE | SEQUENCE | SCHEMA) name]
			TO userList
permissionList=
	permission [, permission]*

The grant is for all the database unless the clause ON is provided. If the clause ON is provided then the grant is local to the object provided, that is, the table, sequence or schema specified in the ON clause.

And revoked by:

revokeStatement=
	REVOKE permissionList
		[ON (TABLE | SEQUENCE | SCHEMA) name]
			TO userList

A permission can take three forms:

permission=
	action databaseObject
	| usrId
	| roleId

If it is a usrId or roleId the permission is actually the permission list associated to the user or role, otherwise, it is the permission to execute an action over a database object.

The supported actions are:

action=
	CREATE | ALTER | DROP | SELECT | USAGE | INSERT | DELETE | UPDATE | REFERENCES | COMMENT | BACKUP | ALL PRIVILEGES

The supported database objects are:

databaseObjects=
	TABLE | INDEX | VIEW | SEQUENCE | TRIGGER | PROCEDURE | ANY

13. CREATE ROLE and DROP ROLE Statements

A role is a collection of permissions. In that sense they are similar to a user. The syntax is:

createRoleStatement=
	CREATE ROLE roleName IDENTIFIED BY 'password'

Permissions can be granted to a role and a user can be granted roles that is equivalent to grant to the user each of the permissions in the role.

A role can be deleted with:

dropRoleStatement=
	DROP ROLE roleName

14. CREATE TRIGGER, DROP TRIGGER, ENABLE TRIGGER, and DISABLE TRIGGER Statements

Triggers are a predefined action that is automatically executed ("triggered") in response to certain events on a particular table or view. These events can include data modification statements (such as INSERT, UPDATE, DELETE) or specific system events. They are created by means of CREATE TRIGGER with the following syntax:

createTrigger=
	CREATE TRIGGER [ IF NOT EXISTS ] triggerName
		(BEFORE | AFTER)
		[INSERT] [UPDATE] [DELETE]
	ON tableName
	FOR EACH ROW EXECUTE triggerFunction ['(' stringLiteral ')']
	[PRIORITY intLiteral]

BEFORE and AFTER are used to indicate whether the trigger has to be executed before the triggering event or after. INSERT, UPDATE, and DELETE specify the triggering event. The ON clause specifies on which table the trigger is defined. triggerFunction LeanXcale currently only supports row level triggers, that is, triggers associated to row events. For this reason, the only form allowed is FOR EACH ROW EXECUTE that determines that is a row level trigger. triggerFunction is the name of a Java function that contains the code of the trigger. PRIORITY indicates the priority of the trigger, in case, there are several defined, they are executed in priority order.

Triggers can be removed with DROP TRIGGER:

dropTriggerStatement=
	DROP TRIGGER triggerName

Triggers can be enabled by means of:

enableTriggerStatement=
	ENABLE TRIGGER triggerName ON tableName

And disabled with:

disableTriggerStatement=
	DISABLE TRIGGER triggerName ON tableName

15. Auditing (Audit Logs)

In this section we describe how to activate and deactivate the different auditing levels.

15.1. Auditing Levels

Auditing is set for all users, unless a user is specified. LeanXcale allows to set the auditing level at database level, table level, and schema level.

For the level chosen then the auditing granularity can be set per session (session is the default granularity, if none specified), per transaction, or per operation. Session granularity means that accesses to a table is logged once per session, the first time the access happen in the session. Transaction granularity means that accesses are logged once per transaction, that is, the first access that happens to an audited table. *Operation granularity means that only accesses with a specific kind of operation will be audited: READ, WRITE, DDL, and ANY. In case all kinds of operations should be audited, then ANY can be used.

15.2. Auditing SQL Syntax

audit=
	[NO] AUDIT [USER userName] [ (TABLE tableName) | (SCHEMA schemaName) ]
		[ BY
		( SESSION
		| TRANSACTION
		| [ READ | WRITE | READ WRITE | DDL | ANY ]
		) ]

To activate audit use AUDIT and to deactivate audit use NO AUDIT. The auditing can be set for a particular user using the USER clause, or for all users if the USER clause is omitted.

To audit the full database to which you connected omit table and schema clauses, and to audit a full schema use the SCHEMA clause and to audit a particular table use the TABLE clause. Note that a tableName no qualified with the schema name is referring to the schema indicated in the connection. To refer to tables in another schema the table name has to be qualified with the schema name: schemaName.tableName. When no audit granularity is specified, that is, there is no BY clause, the default audit granularity will be SESSION.

The BY clause enables to specify the audit granularity as SESSION, TRANSACTION, or operation. In the last case, the operation kind to be audited is indicated, or ANY if all kinds of operations should be audited.

15.3. Audit Logs

The audit log file contains auditing entries, including user authentication. User authentication is always enabled and cannot be disabled. When there is no audit, one can still find the authentication audit records in the lxqe log. When the audit is on, authentication audit records will also appear in the audit log.

Logs are accessed via lx logs command. The command provides the -g option that performs a grep with the pattern specified after -g. For example, to locate authentication entries for the user lxadmin the flag -g (for grep) can be used as shown here:

	unix$ lx logs -g auth.*lxadmin audit
	logs...
	atlantis: [
		log/audit.240221.0708.log	1.75K [
		1:	2024-02-21 07:08:10 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		2:	2024-02-21 07:08:12 lxqe100: audit: auth: db-USR2: 127.0.0.1: local: yes
		3:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		4:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		5:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		...
		]
	]

The user names include both the database and the user name, as a convenience. An exception to this rule is lxadmin that is a user across all databases, while the rest of the users are local to one database.

Authentication failures are reported as follows:

2024-02-29 06:40:52 lxqe100: audit: auth: db-FOO: 127.0.0.1:39018: local: no:auth failed

Another examples is:

	AUDIT TABLE MYTBL
	AUDIT USER USR1
	AUDIT USER USR1 TABLE MYTBL

And after trying to update the table MYTABL by USR1, the audit log file will include this audit record:

2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: /db/USR1/tbl/MYTBL

When setting auditing with operation granularity, the audit log will contain audit records like:

2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: SELECT count(*) FROM MYTBL

When permission is denied to execute a statement, an audit record is added, if audit is enabled:

2024-02-29 06:40:51 lxqe100: audit: perm: db-TEST1: /db/APP/tbl/PERSONS: select on table: permission denied

Authentication audit records are as follows:

2024-02-16 08:45:08,752 INFO lxqe: authenticate: lxadmin: local: yes

report that the user lxadmin (or whoever it was) was authenticated by the local (i.e., the DB) user entry.

When using LDAP, the audit record shows the following:

2024-02-16 08:45:08,752 INFO lxqe: authenticate: db-USR1: ldap: yes

This reports both the database used (db) and the user involved (USR1).

Authentication errors are reported in a similar way:

2024-02-16 08:48:09,992 INFO lxqe: authenticate: db-USR2: ldap: no: [LDAP: error code 49 - Invalid Credentials]

SQL Data Management Language (DML) Reference

This page describes LeanXcale’s SQL DML.

The SQL syntax is described using an EBNF-like form.

1. Isolation Levels

Database transactions provide the so-called ACID properties. The I stands for isolation and it takes care of the consistency of the data in the advent of concurrent transactions updating the same row(s). The higher the consistency level the most consistency is attained, but it is more costly to execute transactions. The levels from less to more consistency are (short names between parenthesis):

  • Loader: Mode to load data without any other parallel activity. It deactivates conflict checking and logging, a transaction gets as start timestamp the current snapshot. It is highly efficient for loading.

  • Raw_read_committed: (rawrc) This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. When the transaction commits its updates become visible, its updates become visible as they reach the involved storage servers, so it would be possible for another transaction to see only part of the updates made by a transaction. No conflicts are checked (writes over the same row by parallel transactions, transactions overlapping in time are allowed).

  • Read_committed (rc): This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. However, it allows non-repeatable reads, as a transaction may see different values for the same record within the same transaction if other transactions commit changes. When the transaction starts it waits for the snapshot to reach the commit timestamp of the last transaction committed in the session, and then it gets the snapshot as start timestamp.

  • Snapshot_isolation. It guarantees repeatable reads and no phantom reads. This isolation levels is called serializability in other database systems (incorrectly). Only write skew anomalies might happen. In case write skew might happen, it can be avoided by using select for update. The trasaction when started waits for the snapshot to be equal or higher than the commit timestamp of the last committed transaction in the session. Then it gets the current snapshot as start timestamp.

The default isolation mode is read_committed. The isolation mode is specified in the connection string in the driver with the property mode using either the long or short name of the isolation mode. The isolation mode can be changed via the SQL statement SET ISOLATION LEVEL described in Section 5

2. DML SQL Statements

The DML SQL statements supported are:

statement=
	queryStatement
	| upsertStatement
	| insertStatement
	| updateStatement
	| deleteStatement
	| mergeStatement
	| setIsolationLevelStatement
	| explainStatement
	| execStatement
	| batchStatement

Multiple SQL statements can be provided by separating them with semicolons:

statementList=
	statement; [statement';']*

3. Query Statement

A query statement can have the following forms:

queryStatement=
	( selectStatement
	| selectForUpdateStatement
	| unionStatement
	| intersectStatement
	| exceptStatement
	| intersectStatement
	| withStatement
	| valuesStatement )
		[ limitClause ]
		[ offsetClause ]
		[ fetchClause ]

3.1. SELECT Statement

The most common statement is the SELECT statement. The SELECT is accepted with FROM (the standard) and also without FROM to have compatibility with those SQL dialects that accept the SELECT without FROM:

selectStatement=
	( SELECT [ ALL | DISTINCT ]
		{ * | projectItem [, projectItem ]* }
	[ FROM tableExpression ]
	[ WHERE booleanExpression ]
	[ GROUP BY { groupItem [, groupItem ]* } ]
	[ HAVING booleanExpression ]
	[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]
	[ FOR UPDATE]
	[ ORDER BY item(, item)*]
	[ limitClause
	| OffsetClause [limitClause | FetchClause]
	| FetchClause ]

3.1.1. ALL/DISTINCT clause

Specifying the DISTINCT clause, filters out the same rows and returns only different (i.e., distinct) values in the resultset. The ALL clause returns all values, i.e., including repeated rows, it is the default behavior.

After the optional ALL/DISTINCT clause the list of columns to be projected is provided:

projectItem=
      expression [ [ AS ] columnAlias ]
  |   tableAlias . *

3.1.2. FROM Clause

In the optional FROM clause a tableExpression is provided that is basically the list of tables used in the select and how they should be joined (kind of join, join condition):

tableExpression=
	tableReference [, tableReference ]*
	| tableExpression [ NATURAL ]
	[ ( LEFT | RIGHT | FULL ) [ OUTER ] ]
		JOIN tableExpression [ joinCondition ]
	| tableExpression CROSS JOIN tableExpression

tableReference=
	tablePrimary
	[ matchRecognize ]
	[ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]

tablePrimary=
	[ [ catalogName . ] schemaName . ] tableName
		'(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'
	| [ LATERAL ] '(' queryStatement ')'
	| UNNEST '(' expression ')' [ WITH ORDINALITY ]
	| [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')'

joinCondition=
	ON booleanExpression
	| USING '(' column [, column ]* ')'

A SELECT without FROM is equivalent to VALUES (included for compatibility with PosgreSQL), as for example:

SELECT 1, CURRENT_TIMESTAMP;

This just yields the value 1 and the current timestamp in the database server.

3.1.3. WHERE Clause

The optional WHERE clause enables to provide a filter condition to select only the rows that fulfill the condition.

[ WHERE booleanExpression ]

3.1.4. GROUP BY and HAVING Clauses

The optional GROUP BY clause:

[ GROUP BY { groupItem [, groupItem ]* } ]
 	[ HAVING booleanExpression ]

enables to aggregate rows with the same values in a set of columns into summary rows.

groupItem=
	expression
	| '(' ')'
	| '(' expression [, expression ]* ')'
	| CUBE '(' expression [, expression ]* ')'
	| ROLLUP '(' expression [, expression ]* ')'
	| GROUPING SETS '(' groupItem [, groupItem ]* ')'

The optional HAVING clause the HAVING clause is used in conjunction with the GROUP BY clause to filter the results of a query based on aggregate functions applied to grouped data. The reason is because the WHERE keyword cannot be used with aggregate functions.

In the SELECT, HAVING and ORDER BY clauses of an aggregate query, all expressions must be constant within the current group (that is, grouping constants as defined by the GROUP BY clause, or constants), or aggregate functions, or a combination of constants and aggregate functions. Aggregate and grouping functions may only appear in an aggregate query, and only in a SELECT, HAVING or ORDER BY clause.

3.1.5. WINDOW Clause

The optional WINDOW clause:

[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]

is used in conjunction with window functions to define a window of rows over which the window function will operate. Window functions perform a calculation across a set of table rows related to the current row. The WINDOW clause allows you to specify the window frame or partition over which the window function operates:

window=
	windowName
	| windowSpec

windowSpec=
	'('
	[ windowName ]
	[ ORDER BY orderItem [, orderItem ]* ]
	[ PARTITION BY expression [, expression ]* ]
	[ RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
		| ROWS numericExpression { PRECEDING | FOLLOWING }
	]
	')'

3.1.6. ORDER BY Clause

The ORDER BY clause allows to sort the resultset by a set of columns and/or expressions:

orderByClause=
	ORDER BY expression [DESC | ASC] [NULLS (FIRST | LAST)] [, expression [DESC | ASC]]*

For each expression (it can be just a column name) one can optionally specify if the order is ascending (ASC) or descending (DESC). One can also optionally specify whether NULLS should appear before other values (NULLS FIRST) or after other values (NULLS LAST). By default, ordering is ascending and nulls appears at the end (i.e., ASC NULLS LAST for all expressions).

3.1.7. LIMIT Clause

The optional LIMIT clause is used to restrict the number of rows returned by a query. Additionally, an additional OFFSET clause can be used which allows to skip a certain number of rows before starting to return rows. The syntax is:

limitClause=
		[ LIMIT (count | ALL) ]

3.1.8. OFFSET Clause

The optional OFFSET clause:

offsetClause =
	[ OFFSET start ( ROW | ROWS ) ]

is used in conjunction with the LIMIT clause to specify the starting point within the result set for fetching rows. It allows you to skip a certain number of rows before beginning to return rows.

3.1.9. FETCH Clause

The optional FETCH clause:

fetchClause =
	[ FETCH ( FIRST | NEXT ) [ count ] ( ROW | ROWS ) ONLY ]

is used to retrieve a specified number of rows from the result set produced by a query. It is often used in conjunction with the OFFSET clause for pagination purposes, where you want to retrieve a specific subset of rows from a larger result set. The FIRST keyword indicates that you want to retrieve the first set of rows from the result set. The NEXT keyword indicates that you want to retrieve the next set of rows from the result set. The ONLY keyword is used in conjunction with the FETCH clause to specify that only the specified number of rows should be returned, and no additional rows should be fetched beyond that limit.

3.1.10. FOR UPDATE Clause

Snapshot isolation isolation level can suffer from write-skew anomaly in some scenarios, when an integrity constraint should be preserved among 2 or more rows, updating only a subset of them. An example of write skew is an operation that can withdraw funds from any two accounts for the same customer as far as the joint balance is positive (i.e. the sum bot both account balances is greater or equal than zero).

Assume that initially two accounts from the same customer with customerid 27 has a joint balance of 10:

+------------+---------+------------+
| ACCOUNTID | BALANCE | CUSTOMERID |
+------------+---------+------------+
| 1          | 3       | 27         |
| 2          | 7       | 27         |
+------------+---------+------------+

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

We have 2 concurrent transactions with isolation level set to snapsthot isolation and with AUTOCOMMIT set to OFF. The first transaction wants to withdraw from accountid 1 an amount of 5. It checks the current joint balance:

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

Since 5 ⇐ 10 it will proceed to subtract 5 from accountid 1. However, before the transaction commits, a second transaction wants to withdraw from accountid 2 an amount of 6. So it checks the current joint balance:

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

Since 6 ⇐ 10 it will proceed to subtract 6 from accountid 2.

Now the first transaction updates accountid 1 with an amount 3-5=-2 and commits:

UPDATE your_table_name
SET BALANCE = -2
WHERE ACCOUNTID = 1;
COMMIT;

Now the second transaction updates accountid 2 with an amount 5-6=-1 and commits:

UPDATE your_table_name
SET BALANCE = -1
WHERE ACCOUNTID = 2;
COMMIT;

The final result after committing both transactions is:

+------------+---------+------------+
| ACCOUNTID  | BALANCE | CUSTOMERID |
+------------+---------+------------+
| 1          | -1      | 27         |
| 2          | -2      | 27         |
+------------+---------+------------+

That violates the integrity constraint of keeping a non negative joint balance. The reason is that both transactions observe the same snapshot.

By using SELECT FOR UPDATE in both transactions, the first transaction doing the update over the pair of involved rows will succeed, while the second one will be aborted because the UPDATE of the first transaction over accountid 1 will conflict with the SELECT FOR UPDATE performed by the second transaction over accountid 1, and thus, resulting in its abortion.

SELECT FOR UPDATE has the same syntax as SELECT except using the clause FOR UPDATE at the end of the SELECT.

3.2. UNION, INTERSECT, and EXCEPT Statements

The union, intersect, and except statements can combine resultsets in different ways performing the set operations union, intersect and difference, respectively, and have the following syntax:

unionStatement=
 	queryStatement UNION [ ALL | DISTINCT ] queryStatement
intersectStatement=
	queryStatement INTERSECT [ ALL | DISTINCT ] queryStatement
exceptStatement=
	queryStatement EXCEPT [ ALL | DISTINCT ] queryStatement

MINUS is not supported but EXCEPT is equivalent to MINUS.

3.3. WITH Statement

The second form of a query has a with clause that has the following syntax:

withStatement=
	WITH withItem [, withItem]* queryStatement
withItem=
	name [ '(' column [, column ]* ')' ]
		AS '(' queryStatement ')'....

3.4. VALUES Statement

The VALUES statement basically defines a table literal with one or more rows, that is, it is a table value constructor. Its syntax is:

values=
	VALUES rowExpression [, rowExpression ]*

Where a rowExpression is specified as follows:

rowExpression=
	expression
	| [ROW] '(' expression [, expression]* ')'

3.5. Subqueries

A scalar sub-query is a sub-query used as an expression. If the sub-query returns no rows, the value is NULL; if it returns more than one row, it is an error. IN, EXISTS and scalar sub-queries can occur in any place where an expression can occur (such as the SELECT clause, WHERE clause, ON clause of a JOIN, or as an argument to an aggregate function).

An IN, EXISTS, or scalar sub-query may be correlated, that is, it may refer to tables in the FROM clause of an enclosing query.

4. UPSERT, INSERT, and UPDATE Statements

For modifying the database there are four kinds of statements allowing to upsert, insert, update, and delete rows from the database. Upsert updates or insert depending whether a row exists with the provided primary key or not. Insert adds a new row with that primary key, if there is no row with that primary key already in the table (otherwise it returns an error). Update modifies a row with a particular primary key, if there is a row in the table with that primary (otherwise it returns an error). Upsert is preferred over insert and update since it is the most efficient. Insert and update both require to perform an additional read to check whether the row exists or not, basically duplicating the cost of the statement. Their syntax is as follows:

upsertStatement=
	UPSERT INTO tablePrimary
		[ '(' column [, column ]* ')' ]
			queryStatement

insertStatement=
	INSERT INTO tablePrimary
		[ '(' column [, column ]* ')' ]
 			queryStatement

updateStatement=
 	UPDATE tablePrimary
 		SET assign [, assign ]*
 			[ WHERE booleanExpression ]

deleteStatement=
	DELETE FROM tablePrimary [ [ AS ] alias ]
		[ WHERE booleanExpression ]

In INSERT and UPSERT statements if the list of target columns is not provided, then the statement must have the same number of columns as the target table and the order of the fields in the VALUES clause must be the same as the order of the fields in the table.

5. SET ISOLATION LEVEL Statement

The current isolation level can be changed within the transaction using the SET ISOLATION LEVEL statement that has the following syntax:

setIsolationLevelStatement=
	SET ISOLATION LEVEL [TRANSACTION]
		LOADER | NONE
		| RAW READ COMMITTED | RRC | READ UNCOMMITTED
		| READ COMMITTED | RC
		| SNAPSHOT ISOLATION | SI | SERIALIZABLE | REPEATABLE READ
			| SESSION [CONSISTENCY]

LeanXcale has multiversion concurrency control that provides highly efficient isolation management between reads and writes. The isolation levels were defined multiversion concurrency control was invented and are based on the notion of locking. LeanXcale supports 4 isolation levels:

  • LOADER (same as NONE). This mode is similar to the NONE standard mode in that it does not have neither concurrency control nor logging. It should be used for massive loads when nobody else is using the database, for instance, a night batch ingesting massive data.

  • RAW READ COMMITTED (same as RRC or READ UNCOMMITTED). This mode is a little more relaxed than READ COMMITTED and allows to read the most current version of any row. It is interesting for applications where one does not want to read from a consistent snapshot in the commit order, but instead the most recent value of the row. When setting READ UNCOMMITTED, this is the mode used that provides more isolation.

  • READ COMMITTED (same as RC). For each SQL statement it uses the most recent consistent snapshot (i.e., longest prefix of committed transactions in commit timestamp order). The equivalent in multiversion concurrency control of locking read committed.

  • SNAPSHOT ISOLATION (same as SI, SERIALIZABLE, REPEATABLE READ, SESSION, SESSION CONSISTENCY). The transaction at start time it gets the current snapshot and performs all reads within the transaction from this snapshot plus the updates performed by the transaction itself. It is roughly similar to SERIALIZABLE.

6. EXPLAIN Statement

The EXPLAIN statement information about the execution plan of a SQL query. The execution plan describes how the database engine will execute the query and retrieve the required data. This information can be helpful for database administrators, developers, and analysts to optimize and tune their queries for better performance:

explainStatement=
	EXPLAIN PLAN
		[ WITH TYPE | WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ]
		[ EXCLUDING ATTRIBUTES | INCLUDING [ ALL ] ATTRIBUTES ]
		[ AS JSON | AS XML ]
			FOR ( queryStatement | insertStatement | updateStatement | deleteStatement )

7. SQL Hints

A hint is an instruction to the optimizer. When writing SQL, you may know information about the data unknown to the optimizer. Hints enable you to force decisions otherwise made by the optimizer.

  • Planner enforcers: thereโ€™s no perfect planner, so it makes sense to implement hints to allow user better control the execution. For instance: โ€œnever merge this subquery with othersโ€ (/+ no_merge */). โ€œtreat those tables as leading onesโ€ (/+ leading */) to affect join ordering, etc;

  • Append meta data/statistics: some statistics like โ€œtable index for scanโ€ or โ€œskew info of some shuffle keysโ€ are somehow dynamic for the query, it would be very convenient to config them with hints because the planning metadata from the planner might not be sufficiently accurate.

  • Operator resource constraints: for many cases, a default resource configuration is assigned for the execution operators, i.e., min parallelism, memory (resource consuming UDF), special resource requirement (GPU or SSD disk), …​

LeanXcale supports two kinds of hints:

  • Query Hint: right after the SELECT keyword;

  • Table Hint: right after the referenced table name.

This is the syntax:

query=
	SELECT /*+ hints */
		...
	FROM
		tableName /*+ hints */
	JOIN
		tableName /*+ hints */
		...

hints=
	hintItem[, hintItem ]*

hintItem=
	hintName
	| hintName(optionKey=optionVal[, optionKey=optionVal ]*)
	| hintName(hintOption [, hintOption ]*)

optionKey=
	simpleIdentifier
	| stringLiteral

optionVal=
	stringLiteral

hintOption=
	simpleIdentifier
	| numericLiteral
	| stringLiteral

8. EXEC Statement

The EXEC statement enables to invoke a table function. Its syntax is:

execStatement=
	EXEC '(' [ nonQueryExpression [, nonQueryExpression]* ] ')'

The parameters can be any expression that is not a query returning a value of the parameter type.

9. BATCH Statement

When auto-commit is set, and requesting an insert/update/delete operation from the result of a select statement with a large resultset, one might one instead of performing a large commit of the whole set of updates, a commit for smaller batches of updates. This can be performed with the BATCH statement that changes the implicit behavior of auto-commit. Instead of committing one time at the end of the statement, it will commit every number of indicate updates to commit every number of times indicated in the parameter of the BATCH statement:

batchStatement=
	BATCH unsignedIntLiteral   |   BATCH unsignedIntLiteral (sqlInsert | sqlUpdate | sqlDelete)

10. Identifiers

Identifiers are the names of tables, columns and other metadata elements used in an SQL query. Identifiers can be quoted or unquoted. Quoted identifiers, such as "Employee Name", start and end with double quotes. Unquoted identifiers, such as employee, must start with a letter and can only contain letters, digits, and underscores. They are implicitly converted to upper case. Identifiers are case insensitive when unquoted (actually, it is as they were written in CAPS), and case sensitive when quoted. Quoted identifiers may contain virtually any character, including spaces and other punctuation. If you wish to include a double quote in an identifier, use another double quote to escape it, as in this example: "An employee called ""Fred"".". Matching identifiers to the name of the referenced object is case-sensitive.

11. Data Types

11.1. Scalar Data Types

Data type Description Range and example literals Can be stored

BOOLEAN

Logical values

Values: TRUE, FALSE, UNKNOWN

Yes

SMALLINT

2 byte signed integer

Range is -32768 to 32767

Yes

INTEGER, INT

4 byte signed integer

Range is -2147483648 to 2147483647

Yes

BIGINT

8 byte signed integer

Range is -9223372036854775808 to 9223372036854775807

Yes

REAL, FLOAT

4 byte floating point

6 decimal digits precision

Yes

DOUBLE

8 byte floating point

15 decimal digits precision

Yes

DECIMAL(p, s)

Fixed point

Example: 123.45 is a DECIMAL(5, 2) value.

Yes

NUMERIC

Fixed point

Yes

CHAR(n), CHARACTER(n)

Fixed-width character string

โ€˜Helloโ€™, โ€˜โ€™ (empty string), _latin1โ€™Helloโ€™, nโ€™Helloโ€™, _UTF16โ€™Helloโ€™, โ€˜Helloโ€™ โ€˜thereโ€™ (literal split into multiple parts)

Yes

VARCHAR(n), CHARACTER VARYING(n)

Variable-length character string

As CHAR(n)

Yes

BINARY(n)

Fixed-width binary string

xโ€™45F0ABโ€™, xโ€™โ€™ (empty binary string), xโ€™ABโ€™ โ€˜CDโ€™ (multi-part binary string literal)

Yes

VARBINARY(n), BINARY VARYING(n)

Variable-length binary string

As BINARY(n)

Yes

DATE

Date

Example: DATE โ€˜1969-07-20โ€™

Yes

TIME

Time of day

Example: TIME โ€˜20:17:40โ€™

Yes. This datatype has milliseconds precision

TIMESTAMP

Date and time

Example: TIMESTAMP โ€˜1969-07-20 20:17:40โ€™

Yes. This datatype has microseconds precision

INTERVAL timeUnit [ TO timeUnit ]

Date time interval

Examples: INTERVAL โ€˜1-5โ€™ YEAR TO MONTH, INTERVAL โ€˜45โ€™ DAY, INTERVAL โ€˜1 2:34:56.789โ€™ DAY TO SECOND

No

TIMESTAMP WITH TIME ZONE

Date and time with time zone

Example: TIMESTAMP โ€˜1969-07-20 20:17:40 America/Los Angelesโ€™

No

TIMESTAMP WITH LOCAL TIME ZONE

Date and time with time zone

Example: TIMESTAMP โ€˜1969-07-20 20:17:40 America/Los Angelesโ€™

No

GEOMETRY

Geometry

Examples: ST_GeomFromText(โ€˜POINT (30 10)โ€™)

No

BIGINT ARRAY

Array of longs

[1, 2, 3]

Yes

DOUBLE ARRAY

Array of doubles

[1.1, 2.2, 3.3]

Yes

VARCHAR ARRAY

Array of strings

['string1', 'string2']

Yes

11.2. Non Scalar Types

LeanXcale currently supports the following non scalar types:

Type Description Example literals Can be stored

ARRAY

Ordered, contiguous collection that may contain duplicates

Example: varchar(10) array

Yes

MULTISET

Unordered collection that may contain duplicates

Example: int multiset

No

MAP

Collection of keys mapped to values

No

CURSOR

Cursor over the result of executing a query

No

11.2.1. Array Data Type

A sentence to create a table with array types could be the following:

CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR ARRAY, PRIMARY KEY(id));

Or the following, if you want to restrict the number of characters of each VARCHAR position in the VARCHAR ARRAY.

CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR (20) ARRAY, PRIMARY KEY(id));

An array literal has the following syntax:

arrayLiteral=
	ARRAY '[' value [, value]* ']'

If we query the above sample table TARRY, the result will be shown this way:

SELECT * FROM tarry
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| ID |                     LONGS                      |                    DOUBLES                     |         STRINGS         |
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| 1  | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] |
+----+------------------------------------------------+------------------------------------------------+-------------------------+

In order to insert a NULL value to an array field, can be done in two ways:

  • Replace the NULL keyword for the emptyList() function.

INSERT INTO tarry VALUES (2, ARRAY[2334589765239847563, 0, 44], emptyList(), ARRAY['CUENCA','BILBAO']);
  • Specify only the columns in which we want to insert values. In unlisted columns, NULL values will be inserted.

INSERT INTO tarry (id, longs, strings) VALUES (2, array[2334589765239847563, 0, 44], ARRAY['CUENCA','BILBAO']);

The result will be:

+----+------------------------------------------------+------------------------------------------------+-------------------------+
| ID |                     LONGS                      |                    DOUBLES                     |         STRINGS         |
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| 1  | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] |
| 2  | [2334589765239847563, 0, 44]                   |                                                | [CUENCA, BILBAO]        |
+----+------------------------------------------------+------------------------------------------------+-------------------------+

11.2.2. Non Scalar Value Constructors

The next table summarizes the constructors for non scalar values:

Operator syntax Description

ROW (value [, value ]*)

Creates a row from a list of values.

(value [, value ]* )

Creates a row from a list of values.

MAP โ€˜[โ€™ key โ€˜]โ€™

Returns the element of a map with a particular key.

ARRAY โ€˜[โ€™ index โ€˜]โ€™

Returns the element at a particular location in an array.

ARRAY โ€˜[โ€™ value [, value ]* โ€˜]โ€™

Creates an array from a list of values.

MAP โ€˜[โ€™ key, value [, key, value ]* โ€˜]โ€™

Creates a map from a list of key-value pairs.

11.2.3. Non Scalar Functions

The supported functions for non scalar types are shown in the below table:

Operator syntax Description

ELEMENT(value)

Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element.

CARDINALITY(value)

Returns the number of elements in an array or multiset.

value MEMBER OF multiset

Returns whether the value is a member of multiset.

multiset IS A SET

Whether multiset is a set (has no duplicates).

multiset IS NOT A SET

Whether multiset is not a set (has duplicates).

multiset IS EMPTY

Whether multiset contains zero elements.

multiset IS NOT EMPTY

Whether multiset contains one or more elements.

multiset SUBMULTISET OF multiset2

Whether multiset is a submultiset of multiset2.

multiset NOT SUBMULTISET OF multiset2

Whether multiset is not a submultiset of multiset2.

multiset MULTISET UNION [ ALL | DISTINCT ] multiset2

Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET INTERSECT [ ALL | DISTINCT ] multiset2

Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET EXCEPT [ ALL | DISTINCT ] multiset2

Returns the difference of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

Note: the UNNEST relational operator converts a collection to a relation.

12. Type Conversion

LeanXcale supports both implicit and explicit conversion in expressions.

12.1. Implicit and Explicit Type Conversion

It is recommended to specify explicit conversions, rather than implicit conversions, for these reasons:

  • SQL statements are easier to understand when you use explicit datatype conversion functions.

  • Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.

  • Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR value may return an unexpected format.

12.2. Explicit Type Conversion

Explicit type conversion is attained with the CAST function:

Operator syntax Description

CAST(value AS type)

Converts a value to type.

The supported types in the CAST function are:

type=
	typeName
		[ collectionsTypeName ]*

typeName=
	sqlTypeName
	| rowTypeName
 	| compoundIdentifier

sqlTypeName=
	CHAR [ precision ] [ charSet ]
	| VARCHAR [ precision ] [ charSet ]
	| DATE
	| TIME
	| TIMESTAMP
	| GEOMETRY
	| DECIMAL [ precision [, scale] ]
	| BOOLEAN
	| TINYINT
	| SMALLINT
	| INTEGER
	| BIGINT
	| BINARY [ precision ]
	| VARBINARY [ precision ]
	| REAL
	| DOUBLE
	| FLOAT
	| ANY [ precision [, scale] ]

collectionsTypeName=
	ARRAY | MULTISET

rowTypeName=
	ROW '('
		fieldName1 fieldType1 [ NULL | NOT NULL ]
		[ , fieldName2 fieldType2 [ NULL | NOT NULL ] ]*
	')'

char=
	CHARACTER | CHAR

varchar=
	char VARYING | VARCHAR

decimal=
	DECIMAL | DEC | NUMERIC

integer:
	INTEGER | INT

varbinary=
	BINARY VARYING | VARBINARY

double=
	DOUBLE [ PRECISION ]

time=
	TIME [ precision ] [ timeZone ]

timestamp=
	TIMESTAMP [ precision ] [ timeZone ]

charSet=
	CHARACTER SET charSetName

timeZone=
	WITHOUT TIME ZONE
	| WITH LOCAL TIME ZONE

12.3. Implicit Type Conversion

refers to the automatic conversion of one data type to another by the database system without requiring explicit instructions from the programmer. This conversion occurs during the execution of SQL statements or expressions. The following table shows all possible conversions, without regard to the context in which it is made. The following convention is used:

  • i: implicit cast

  • e: explicit cast

  • x: not allowed

FROM - TO NULL BOOLEAN TINYINT SMALLINT INT BIGINT DECIMAL FLOAT or REAL DOUBLE INTERVAL DATE TIME TIMESTAMP CHAR or VARCHAR BINARY or VARBINARY

NULL

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

BOOLEAN

x

i

e

e

e

e

e

e

e

x

x

x

x

i

x

TINYINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

SMALLINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

INT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

BIGINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

DECIMAL

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

FLOAT/REAL

x

e

i

i

i

i

i

i

i

x

x

x

e

i

x

DOUBLE

x

e

i

i

i

i

i

i

i

x

x

x

e

i

x

INTERVAL

x

x

e

e

e

e

e

x

x

i

x

x

x

e

x

DATE

x

x

x

x

x

x

x

x

x

x

i

x

i

i

x

TIME

x

x

x

x

x

x

x

x

x

x

x

i

e

i

x

TIMESTAMP

x

x

e

e

e

e

e

e

e

x

i

e

i

i

x

CHAR or VARCHAR

x

e

i

i

i

i

i

i

i

i

i

i

i

i

i

BINARY or VARBINARY

x

x

x

x

x

x

x

x

x

x

e

e

e

i

i

12.3.1. Conversion Contexts and Strategies

  • Set operation (UNION, EXCEPT, INTERSECT): Analyze the data types of each row in every branch and determine the common type for each pair of fields

  • Binary arithmetic expression (+, -, &, ^, /, %): promote string operand to data type of the other numeric operand;

  • Binary comparison (=, <, , <>, >, >=): if operands are STRING and TIMESTAMP, promote to TIMESTAMP; make 1 = true and 0 = false always evaluate to TRUE. If there is numeric type operand, find common type for both operands.

  • IN sub-query: compare type of left hand side and right hand side, and find the common type. If it is a non scalar type, find the widest type for every field.

  • IN expression list: compare every expression to find the common type.

  • CASE WHEN expression or COALESCE: find the common widest type of the THEN and ELSE operands.

  • Character + INTERVAL or character - INTERVAL: promote character to TIMESTAMP.

  • Built-in function: look up the type families registered in the checker, find the family default type if checker rules allow it.

  • User-defined function (UDF): coerce based on the declared argument types of the eval() method.

  • INSERT and UPDATE: coerce a source field to counterpart target table fieldโ€™s type if the two fields differ with type name or precision(scale).

Note:

Implicit type coercion of the following cases is ignored:

  • One of the type is ANY;

  • Type coercion within CHARACTER types are always ignored, i.e. from CHAR(20) to VARCHAR(30);

  • Type coercion from a numeric to another with higher precedence is ignored, i.e. from INT to LONG.

12.3.2. Strategies for Finding Common Type

  • If the operator has the expected data types, just take them as the desired one, e.g., the UDF would have eval() method which has reflection argument types.

  • If there is no expected data type but the data type families are registered, try to coerce the arguments to the familyโ€™s default data type, i.e. the String family will have a VARCHAR type.

  • If neither expected data type nor families are specified, try to find the tightest common type of the node types, i.e., INTEGER and DOUBLE will return DOUBLE, the numeric precision does not lose for this case.

  • If no tightest common type is found, try to find a wider type, i.e. VARCHAR and INTEGER will return INTEGER, we allow some precision loss when widening decimal to fractional, or promote to VARCHAR type.

13. Operators and Functions

The following sections provides a summary of all operators and functions supported by LeanXcale.

13.1. Comparison Operators

The following table provides the list of supported comparison operators:

Operator syntax Description

value1 = value2

Equals

value1 <> value2

Not equal

value1 > value2

Greater than

value1 >= value2

Greater than or equal

value1 < value2

Less than

value1 <= value2

Less than or equal

value IS NULL

Whether value is null

value IS NOT NULL

Whether value is not null

value1 IS DISTINCT FROM value2

Whether two values are not equal, treating null values as the same

value1 IS NOT DISTINCT FROM value2

Whether two values are equal, treating null values as the same

value1 BETWEEN value2 AND value3

(value1 >= value2) AND (value1 <= value3)

value1 NOT BETWEEN value2 AND value3

Whether value1 is less than value2 or greater than value3

string1 LIKE patternString [ ESCAPE string3 ]

Whether string1 matches pattern patternString

string1 NOT LIKE patternString [ ESCAPE string3 ]

Whether string1 does not match pattern patternString

string1 SIMILAR TO patternString [ ESCAPE string3 ]

Whether string1 matches regular expression patternString

string1 NOT SIMILAR TO patternString [ ESCAPE string3 ]

Whether string1 does not match regular expression patternString

value IN (value [, value]*)

Whether value is equal to a value in a list

value NOT IN (value [, value]*)

Whether value is not equal to any value in a list

value IN (sub-query)

Whether value is equal to a row returned by sub-query

value NOT IN (sub-query)

Whether value is not equal to any row returned by sub-query

value comparison SOME (sub-query)

Whether value fulfills the comparison for at least one row returned by sub-query

value comparison ANY (sub-query)

Synonym for SOME

value comparison ALL (sub-query)

Whether value fulfills the comparison for every row returned by sub-query

EXISTS (sub-query)

Whether sub-query returns at least one row

13.2. Regular Expressions

LIKE and SIMILAR TO operators allow to perform pattern matching using regular expressions. You can use the special characters ('%','_') and you can also use regular expressions with a syntax similar to perl-like regular expressions. Specifically, regular expressions may contain.

  • char or char sequence: A single character or a sequence of characters that matches itself

  • '.': The dot is a wildcard that matches any single character except for a newline

  • '^': Matches the start of a line

  • '[a-z]', Represents a character class. In this example, it matches any lowercase letter from 'a' to 'z'

  • '[^a-z]': Represents a negated character class. It matches any character that is not a lowercase letter from 'a' to 'z'

  • '\a': Represents a predefined character class for alphanumeric characters. It includes letters (upper and lower case) and digits

  • '\u': Represents a predefined character class for uppercase letters

  • '\l': Represents a predefined character class for lowercase letters

  • '\b': Represents a predefined character class for whitespace characters, such as space and tab

  • '\w': Represents a predefined character class for word characters. It includes letters, digits, and underscore

They may also contain the following operators (from lowest to highest precedence):

  • '(' and ')': for grouping

  • '|': to provide alternatives

  • concatenation: implicit, no operator

  • '*': zero or more times

  • '+': one or more times

  • '?': zero or one time

It should be noted that the main difference between LIKE and SIMILAR TO is that LIKE implies the expression will start with the text in the expression while SIMILAR doesn’t imply '^'.

13.3. Logical Operators

The table below outlines the logical operators:

Operator syntax Description

boolean1 OR boolean2

Whether boolean1 is TRUE or boolean2 is TRUE

boolean1 AND boolean2

Whether boolean1 and boolean2 are both TRUE

NOT boolean

Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN

boolean IS FALSE

Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN

boolean IS NOT FALSE

Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN

boolean IS TRUE

Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN

boolean IS NOT TRUE

Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN

boolean IS UNKNOWN

Whether boolean is UNKNOWN

boolean IS NOT UNKNOWN

Whether boolean is not UNKNOWN

13.4. Arithmetic Operators and Functions

This table provides an overview of arithmetic operators and functions:

Operator syntax Description

+ numeric

Returns numeric

- numeric

Returns negative numeric

numeric1 + numeric2

Returns numeric1 plus numeric2

numeric1 - numeric2

Returns numeric1 minus numeric2

numeric1 * numeric2

Returns numeric1 multiplied by numeric2

numeric1 / numeric2

Returns numeric1 divided by numeric2

POWER(numeric1, numeric2)

Returns numeric1 raised to the power of numeric2

ABS(numeric)

Returns the absolute value of numeric

MOD(numeric1, numeric2)

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

SQRT(numeric)

Returns the square root of numeric

LN(numeric)

Returns the natural logarithm (base e) of numeric

LOG10(numeric)

Returns the base 10 logarithm of numeric

EXP(numeric)

Returns e raised to the power of numeric

CEIL(numeric)

Rounds numeric up, returning the smallest integer that is greater than or equal to numeric

FLOOR(numeric)

Rounds numeric down, returning the largest integer that is less than or equal to numeric

RAND([seed])

Generates a random double between 0 and 1 inclusive, optionally initializing the random number generator with seed

RAND_INTEGER([seed, ] numeric)

Generates a random integer between 0 and numeric - 1 inclusive, optionally initializing the random number generator with seed

ACOS(numeric)

Returns the arc cosine of numeric

ASIN(numeric)

Returns the arc sine of numeric

ATAN(numeric)

Returns the arc tangent of numeric

ATAN2(numeric, numeric)

Returns the arc tangent of the numeric coordinates

CBRT(numeric)

Returns the cube root of numeric

COS(numeric)

Returns the cosine of numeric

COT(numeric)

Returns the cotangent of numeric

DEGREES(numeric)

Converts numeric from radians to degrees

PI()

Returns a value that is closer than any other value to pi

RADIANS(numeric)

Converts numeric from degrees to radians

ROUND(numeric1 [, numeric2])

Rounds numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

SIGN(numeric)

Returns the signum of numeric

SIN(numeric)

Returns the sine of numeric

TAN(numeric)

Returns the tangent of numeric

TRUNCATE(numeric1 [, numeric2])

Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

13.5. String Operators and Functions

A summary of string operators and functions is depicted in the below table:

Operator syntax Description

string || string

Concatenates two character strings

CHAR_LENGTH(string)

Returns the number of characters in a character string

CHARACTER_LENGTH(string)

As CHAR_LENGTH(string)

UPPER(string)

Returns a character string converted to upper case

LOWER(string)

Returns a character string converted to lower case

POSITION(string1 IN string2)

Returns the position of the first occurrence of string1 in string2

POSITION(string1 IN string2 FROM integer)

Returns the position of the first occurrence of string1 in string2 starting at a given point (not standard SQL)

TRIM( \{ BOTH | LEADING | TRAILING } string1 FROM string2)

Removes the longest string containing only the characters in string1 from the start/end/both ends of string1

OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])

Replaces a substring of string1 with string2

SUBSTRING(string FROM integer)

Returns a substring of a character string starting at a given point

SUBSTRING(string FROM integer FOR integer)

Returns a substring of a character string starting at a given point with a given length

INITCAP(string)

Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

13.6. String Operators and Functions

The following table provides a summary of string operators and functions:

Operator syntax Description

binary || binary

Concatenates two binary strings

POSITION(binary1 IN binary2)

Returns the position of the first occurrence of binary1 in binary2

POSITION(binary1 IN binary2 FROM integer)

Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL)

OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ])

Replaces a substring of binary1 with binary2

SUBSTRING(binary FROM integer)

Returns a substring of binary starting at a given point

SUBSTRING(binary FROM integer FOR integer)

Returns a substring of binary starting at a given point with a given length

13.7. Operator Precedence

The following table summarizes the operator precedence and associativity, from highest to lowest.

Operator Associativity

.

left

::

left

[ ] (array element)

left

+ - (unary plus, minus)

right

* / %

left

+ -

left

BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS etc.

-

< > = ⇐ >= <> !=

left

IS NULL, IS FALSE, IS NOT TRUE etc.

-

NOT

right

AND

left

OR

left

13.8. Date and Time Functions

The below table provides a quick reference of Data and Time Functions:

Operator syntax Description

LOCALTIME

Returns the current date and time in the session time zone in a value of datatype TIME

LOCALTIME(precision)

Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision

LOCALTIMESTAMP

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP

LOCALTIMESTAMP(precision)

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with precision digits of precision

CURRENT_TIME

Returns the current time in the session time zone, in a value of datatype TIMESTAMP

CURRENT_DATE

Returns the current date in the session time zone, in a value of datatype DATE

CURRENT_TIMESTAMP

Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP

EXTRACT(timeUnit FROM datetime)

Extracts and returns the value of a specified datetime field from a datetime value expression

FLOOR(datetime TO timeUnit)

Rounds datetime down to timeUnit

CEIL(datetime TO timeUnit)

Rounds datetime up to timeUnit

YEAR(date)

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

QUARTER(date)

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

MONTH(date)

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

WEEK(date)

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

DAYOFYEAR(date)

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

DAYOFMONTH(date)

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

DAYOFWEEK(date)

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

HOUR(date)

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

MINUTE(date)

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

SECOND(date)

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

TIMESTAMPADD(timeUnit, integer, datetime)

Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to datetime + INTERVAL 'integer' timeUnit

TIMESTAMPDIFF(timeUnit, datetime, datetime2)

Returns the (signed) number of timeUnit intervals between datetime and datetime2.

LAST_DAY(date)

Returns the date of the last day of the month in a value of datatype DATE; For example, it returns DATEโ€™2020-02-29โ€™ for both DATEโ€™2020-02-10โ€™ and TIMESTAMPโ€™2020-02-10 10:10:10โ€™

Where:

timeUnit=
	MILLENNIUM | CENTURY | DECADE | YEAR
	| QUARTER | MONTH | WEEK | DOY | DOW | DAY
	| HOUR | MINUTE | SECOND | EPOCH

Note:

  • As stated above, by default, DATE, TIME and TIMESTAMP have no time zone. For those types, there is no implicit time zone, such as UTC (as in Java) or the local time zone. It is left to the user or application to supply a time zone. In turn, TIMESTAMP WITH LOCAL TIME ZONE does not store the time zone internally, times will be stored in UTC and it will rely on the supplied time zone to provide correct semantics.

  • Interval literals may only use time units YEAR, MONTH, DAY, HOUR, MINUTE and SECOND.

====Conditional Functions and Operators

You can find an overview of conditional functions and operators in the following table:

Operator syntax Description

CASE value
WHEN value1 [, value11 ]* THEN result1
[ WHEN valueN [, valueN1 ]* THEN resultN ]*
[ ELSE resultZ ]
END

Enumerated values

CASE
WHEN condition1 THEN result1
[ WHEN conditionN THEN resultN ]*
[ ELSE resultZ ]
END

Enumerated conditions

NULLIF(value, value)

Returns NULL if the values are the same.

For example, NULLIF(5, 5) returns NULL; NULLIF(5, 0) returns 5.

COALESCE(value, value [, value ]*)

returns the value of the first expression that is not NULL, or NULL otherwise

For example, COALESCE(NULL, 5) returns 5.

13.9. Period Functions

A period is defined by two points in time and can be done in the following ways:

period=
	(datetime, datetime)
	| (datetime, interval)
	| PERIOD (datetime, datetime)
	| PERIOD (datetime, interval)

There are a number of supported functions over periods:

Operator syntax Description

period1 CONTAINS datetime dt

Checks if the time period represented by period1 contains the specified datetime 'dt'.

period1 CONTAINS period2

Verifies if the time period represented by period1 contains the entire time period period2.

period1 OVERLAPS period2

Determines if there is any overlap between the time periods represented by period1 and period2. If there is any common time interval, the condition is true.

period1 EQUALS period2

Checks if the time period represented by period1 is equal to the time period period2 in its entirety.

period1 PRECEDES period2

Validates whether the time period represented by period1 comes before (precedes) the time period period2. It doesn’t necessarily mean immediate succession.

period1 IMMEDIATELY PRECEDES period2

Checks if the time period represented by period1 immediately precedes the time period period2 without any gap in between.

period1 SUCCEEDS period2

Verifies whether the time period represented by period1 comes after (succeeds) the time period period2. Similar to PRECEDES, it doesn’t imply immediate succession.

period1 IMMEDIATELY SUCCEEDS period2

Checks if the time period represented by period1 immediately succeeds the time period period2 without any gap.

Where period1 and period2 are period expressions.

13.10. Aggregate functions

Syntax:

aggregateCall=
	agg( [ ALL | DISTINCT ] value [, value ]*)
		[ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
		[ FILTER (WHERE condition) ]
	| agg(*) [ FILTER (WHERE condition) ]

where agg is one of the operators in the following table, or a user-defined aggregate function.

If FILTER is present, the aggregate function only considers rows for which condition evaluates to TRUE.

If DISTINCT is present, duplicate argument values are eliminated before being passed to the aggregate function.

If WITHIN GROUP is present, the aggregate function sorts the input rows according to the ORDER BY clause inside WITHIN GROUP before aggregating values. WITHIN GROUP is only allowed for hypothetical set functions (RANK, DENSE_RANK, PERCENT_RANK and CUME_DIST), inverse distribution functions (PERCENTILE_CONT and PERCENTILE_DISC) and collection functions (COLLECT and LISTAGG).

In orderItem, if expression is a positive integer n, it denotes the nth item in the SELECT clause. As an example:

SELECT F1, F2, F3, F4 FROM T ORDER BY 1,2

This will order by F1 and then F2 which are the first and second in the SELECT clause.

Operator syntax Description

COLLECT( [ ALL | DISTINCT ] value)

Returns a multiset of the values

LISTAGG( [ ALL | DISTINCT ] value [, separator])

Returns values concatenated into a string, delimited by separator (default โ€˜,โ€™)

COUNT( [ ALL | DISTINCT ] value [, value ]*)

Returns the number of input rows for which value is not null (wholly not null if value is composite)

COUNT(*)

Returns the number of input rows

FUSION(multiset)

Returns the multiset union of multiset across all input values

APPROX_COUNT_DISTINCT(value [, value ]*)

Returns the approximate number of distinct values of value; the database is allowed to use an approximation but is not required to

AVG( [ ALL | DISTINCT ] numeric)

Returns the average (arithmetic mean) of numeric across all input values

SUM( [ ALL | DISTINCT ] numeric)

Returns the sum of numeric across all input values

MAX( [ ALL | DISTINCT ] value)

Returns the maximum value of value across all input values

MIN( [ ALL | DISTINCT ] value)

Returns the minimum value of value across all input values

ANY_VALUE( [ ALL | DISTINCT ] value)

Returns one of the values of value across all input values; this is NOT specified in the SQL standard

BIT_AND( [ ALL | DISTINCT ] value)

Returns the bitwise AND of all non-null input values, or null if none

BIT_OR( [ ALL | DISTINCT ] value)

Returns the bitwise OR of all non-null input values, or null if none

BIT_XOR( [ ALL | DISTINCT ] value)

Returns the bitwise XOR of all non-null input values, or null if none

STDDEV_POP( [ ALL | DISTINCT ] numeric)

Returns the population standard deviation of numeric across all input values

STDDEV_SAMP( [ ALL | DISTINCT ] numeric)

Returns the sample standard deviation of numeric across all input values

STDDEV( [ ALL | DISTINCT ] numeric)

Synonym for STDDEV_SAMP

VAR_POP( [ ALL | DISTINCT ] value)

Returns the population variance (square of the population standard deviation) of numeric across all input values

VAR_SAMP( [ ALL | DISTINCT ] numeric)

Returns the sample variance (square of the sample standard deviation) of numeric across all input values

COVAR_POP(numeric1, numeric2)

Returns the population covariance of the pair (numeric1, numeric2) across all input values

COVAR_SAMP(numeric1, numeric2)

Returns the sample covariance of the pair (numeric1, numeric2) across all input values

REGR_COUNT(numeric1, numeric2)

Returns the number of rows where both dependent and independent expressions are not null

REGR_SXX(numeric1, numeric2)

Returns the sum of squares of the dependent expression in a linear regression model

REGR_SYY(numeric1, numeric2)

Returns the sum of squares of the independent expression in a linear regression model

13.11. Window Functions

Window functions are a category of SQL functions that operate on a specific "window" or subset of rows within a result set. These functions provide a way to perform calculations across a range of rows that are related to the current row. The syntax is as follows:

windowedAggregateCall=
	agg( [ ALL | DISTINCT ] value [, value ]*)
		[ RESPECT NULLS | IGNORE NULLS ]
		[ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
		[ FILTER (WHERE condition) ]
			OVER window
	| agg(*)
		[ FILTER (WHERE condition) ]
		OVER window

where agg is one of the operators in the following table, or a user-defined aggregate function.

DISTINCT, FILTER, and WITHIN GROUP behave as described for aggregate functions.

The supported operators are summarized in the following table:

Operator syntax Description

COUNT(value [, value ]*) OVER window

Returns the number of rows in window for which value is not null (wholly not null if value is composite)

COUNT(*) OVER window

Returns the number of rows in window

AVG(numeric) OVER window

Returns the average (arithmetic mean) of numeric across all values in window

SUM(numeric) OVER window

Returns the sum of numeric across all values in window

MAX(value) OVER window

Returns the maximum value of value across all values in window

MIN(value) OVER window

Returns the minimum value of value across all values in window

RANK() OVER window

Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer

DENSE_RANK() OVER window

Returns the rank of the current row without gaps; this function counts peer groups

ROW_NUMBER() OVER window

Returns the number of the current row within its partition, counting from 1

FIRST_VALUE(value) OVER window

Returns value evaluated at the row that is the first row of the window frame

LAST_VALUE(value) OVER window

Returns value evaluated at the row that is the last row of the window frame

LEAD(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

LAG(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

NTH_VALUE(value, nth) OVER window

Returns value evaluated at the row that is the nth row of the window frame

NTILE(value) OVER window

Returns an integer ranging from 1 to value, dividing the partition as equally as possible

Note:

  • You may specify null treatment (IGNORE NULLS, RESPECT NULLS) for FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD and LAG functions. The syntax handled by the parser, but only RESPECT NULLS is implemented at runtime.

13.12. Grouping Functions

LeanXcale supports the following grouping functions:

Operator syntax Description

GROUPING(expression [, expression ]*)

Returns a bit vector of the given grouping expressions

GROUP_ID()

Returns an integer that uniquely identifies the combination of grouping keys

GROUPING_ID(expression [, expression ]*)

Synonym for GROUPING

13.13. Grouped Window Functions

Grouped window functions occur in the GROUP BY clause and define a key value that represents a window containing several rows.

13.14. CTUMBLE

In streaming queries, TUMBLE assigns a window for each row of a relation based on a timestamp column. An assigned window is specified by its beginning and ending. All assigned windows have the same length, and thatโ€™s why tumbling sometimes is named as โ€œfixed windowingโ€. CTUMBLE function works in a similar way as TUMBLE does, but for regular tables.

Operator syntax Description

CTUMBLE(datetime COLUMN, interval [, time ])

Indicates a tumbling window of interval for datetime, optionally aligned at time

Here is an example:

SELECT CAST(CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000') AS TIMESTAMP) AS GTIMEST,
	keyid,
	SUM(metric1) AS SUMM1,
	SUM(metric2) AS SUMM2,
FROM TIMEMETRICS
WHERE keyid <= '1'
GROUP BY CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000'), keyid
ORDER BY 1,2

In the query above you will be getting the addition of the metrics for every key and for every hour (60 minute) interval. The aligned at value is causing the intervals to start at minute 01

The result obtained would look similar to:

GTIMEST KEYID SUMM1 SUMM2

2020-09-27 17:01:00

0

10

2.5

2020-09-27 17:01:00

1

10

2.5

2020-09-27 18:01:00

0

20

5.0

2020-09-27 18:01:00

1

20

5.0

2020-09-27 19:01:00

0

30

7.5

2020-09-27 19:01:00

1

30

7.5

13.15. Grouped Auxiliary Functions

Grouped auxiliary functions allow you to access properties of a window defined by a grouped window function:

Operator syntax Description

HOP_END(expression, slide, size [, time ])

Returns the value of expression at the end of the window defined by a HOP function call

HOP_START(expression, slide, size [, time ])

Returns the value of expression at the beginning of the window defined by a HOP function call

SESSION_END(expression, interval [, time])

Returns the value of expression at the end of the window defined by a SESSION function call

SESSION_START(expression, interval [, time])

Returns the value of expression at the beginning of the window defined by a SESSION function call

TUMBLE_END(expression, interval [, time ])

Returns the value of expression at the end of the window defined by a TUMBLE function call

TUMBLE_START(expression, interval [, time ])

Returns the value of expression at the beginning of the window defined by a TUMBLE function call

13.16. JSON Functions

LeanXcale also supports a number of functions to operate over strings containing JSON values. We use the following convention in the following subsections:

  • jsonValue is a character string containing a JSON value;

  • path is a character string containing a JSON path expression; mode flag strict or lax should be specified in the beginning of path.

13.16.1. JSON Query Functions

These are the query function predicates over JSON values:

Operator syntax Description

JSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR ) ON ERROR } )

Whether a jsonValue satisfies a search criterion described using JSON path expression path

JSON_VALUE(jsonValue, path [ RETURNING type ] [ { ERROR | NULL | DEFAULT expr } ON EMPTY ] [ { ERROR | NULL | DEFAULT expr } ON ERROR ] )

Extract an SQL scalar from a jsonValue using JSON path expression path

JSON_QUERY(jsonValue, path [ { WITHOUT [ ARRAY ] | WITH [ CONDITIONAL | UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON ERROR ] )

Extract a JSON object or JSON array from jsonValue using the path JSON path expression

Note:

  • The ON ERROR and ON EMPTY clauses define the fallback behavior of the function when an error is thrown or a null value is about to be returned.

  • The ARRAY WRAPPER clause defines how to represent a JSON array result in JSON_QUERY function. The following examples compare the wrapper behaviors.

Sample Data:

{"a": "[1,2]", "b": [1,2], "c": "hi"}

These are the JSON query functions: Comparison:

Operator $.a $.b $.c

JSON_VALUE

[1, 2]

error

hi

JSON QUERY WITHOUT ARRAY WRAPPER

error

[1, 2]

error

JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER

[ โ€œ[1,2]โ€ ]

[ [1,2] ]

[ โ€œhiโ€ ]

JSON QUERY WITH CONDITIONAL ARRAY WRAPPER

[ โ€œ[1,2]โ€ ]

[1,2]

[ โ€œhiโ€ ]

13.16.2. JSON Constructor Functions

These are the JSON value constructor functions:

Operator syntax Description

JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct JSON object using a series of key (name) value (value) pairs

JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON object using a key (name) value (value) pair

JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct a JSON array using a series of values (value)

JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON array using a value (value)

Note:

  • The flag FORMAT JSON indicates the value is formatted as JSON character string. When FORMAT JSON is used, the value should be de-parse from JSON character string to a SQL structured value.

  • ON NULL clause defines how the JSON output represents null values. The default null behavior of JSON_OBJECT and JSON_OBJECTAGG is NULL ON NULL, and for JSON_ARRAY and JSON_ARRAYAGG it is ABSENT ON NULL.

  • If ORDER BY clause is provided, JSON_ARRAYAGG sorts the input rows into the specified order before performing aggregation.

13.16.3. JSON Validity Operators

The following functions enable to check the validity of a JSON value:

Operator syntax Description

jsonValue IS JSON [ VALUE ]

Whether jsonValue is a JSON value

jsonValue IS NOT JSON [ VALUE ]

Whether jsonValue is not a JSON value

jsonValue IS JSON SCALAR

Whether jsonValue is a JSON scalar value

jsonValue IS NOT JSON SCALAR

Whether jsonValue is not a JSON scalar value

jsonValue IS JSON OBJECT

Whether jsonValue is a JSON object

jsonValue IS NOT JSON OBJECT

Whether jsonValue is not a JSON object

jsonValue IS JSON ARRAY

Whether jsonValue is a JSON array

jsonValue IS NOT JSON ARRAY

Whether jsonValue is not a JSON array

13.16.4. JSON Auxiliary Functions

These are some JSON auxiliary functions:

Operator syntax Description

JSON_TYPE(jsonValue)

Returns a string value indicating the type of a jsonValue

JSON_DEPTH(jsonValue)

Returns an integer value indicating the depth of a jsonValue

JSON_PRETTY(jsonValue)

Returns a pretty-printing of jsonValue

JSON_LENGTH(jsonValue [, path ])

Returns a integer indicating the length of jsonValue

JSON_KEYS(jsonValue [, path ])

Returns a string indicating the keys of a JSON jsonValue

JSON_REMOVE(jsonValue, path[, path])

Removes data from jsonValue using a series of path expressions and returns the result

Note:

  • JSON_TYPE / JSON_DEPTH / JSON_PRETTY return null if the argument is null

  • JSON_LENGTH / JSON_KEYS / JSON_REMOVE return null if the first argument is null

  • JSON_TYPE generally returns an upper-case string flag indicating the type of the JSON input. Currently supported supported type flags are:

    • INTEGER

    • STRING

    • FLOAT

    • DOUBLE

    • LONG

    • BOOLEAN

    • DATE

    • OBJECT

    • ARRAY

    • NULL

  • JSON_DEPTH defines a JSON valueโ€™s depth as follows:

    • An empty array, empty object, or scalar value has depth 1;

    • A non-empty array containing only elements of depth 1 or non-empty object containing only member values of depth 1 has depth 2;

    • Otherwise, a JSON document has depth greater than 2.

  • JSON_LENGTH defines a JSON valueโ€™s length as follows:

    • A scalar value has length 1;

    • The length of array or object is the number of elements is contains.

Usage Examples:

13.16.5. JSON_TYPE Example

This statement:

SELECT JSON_TYPE(v) AS c1,
	JSON_TYPE(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
	JSON_TYPE(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
	JSON_TYPE(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4

OBJECT

ARRAY

INTEGER

BOOLEAN

13.16.6. JSON_DEPTH Example

This statement returns:

SELECT JSON_DEPTH(v) AS c1,
	JSON_DEPTH(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
	JSON_DEPTH(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
	JSON_DEPTH(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4

3

2

1

1

13.16.7. JSON_LENGTH Example

This statement:

SELECT JSON_LENGTH(v) AS c1,
	JSON_LENGTH(v, 'lax $.a') AS c2,
	JSON_LENGTH(v, 'strict $.a[0]') AS c3,
	JSON_LENGTH(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true]}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4

1

2

1

1

13.16.8. JSON_KEYS Example

This statement:

SELECT JSON_KEYS(v) AS c1,
	JSON_KEYS(v, 'lax $.a') AS c2,
	JSON_KEYS(v, 'lax $.b') AS c2,
	JSON_KEYS(v, 'strict $.a[0]') AS c3,
	JSON_KEYS(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true],"b": {"c": 30}}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4 c5

[โ€œaโ€, โ€œbโ€]

NULL

[โ€œcโ€]

NULL

NULL

13.16.9. JSON_REMOVE Example

This statement:

SELECT JSON_REMOVE(v, '$[1]') AS c1
FROM (VALUES ('["a", ["b", "c"], "d"]')) AS t(v)
LIMIT 10;

returns:

c1

[โ€œaโ€, โ€œdโ€]

13.16.10. JSON_STORAGE_SIZE Example

This statement:

SELECT
	JSON_STORAGE_SIZE('[100, \"sakila\", [1, 3, 5], 425.05]') AS c1,
	JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"a\", \"c\": \"[1, 3, 5, 7]\"}') AS c2,
	JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"xyz\", \"c\": \"[1, 3, 5, 7]\"}') AS c3,
	JSON_STORAGE_SIZE('[100, \"json\", [[10, 20, 30], 3, 5], 425.05]') AS c4
LIMIT 10;

returns:

c1 c2 c3 c4

29

35

37

36

13.16.11. DECODE Example

This statement:

SELECT DECODE(f1, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c1,
	DECODE(f2, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c2,
	DECODE(f3, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c3,
	DECODE(f4, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c4,
	DECODE(f5, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c5
FROM (VALUES (1, 2, 3, 4, 5)) AS t(f1, f2, f3, f4, f5);

returns:

c1 c2 c3 c4 c5

aa

bb

cc

dd

ee

13.16.12. TRANSLATE example

This statement:

SELECT TRANSLATE('Aa*Bb*Cc''D*d', ' */''%', '_') as c1,
	TRANSLATE('Aa/Bb/Cc''D/d', ' */''%', '_') as c2,
	TRANSLATE('Aa Bb Cc''D d', ' */''%', '_') as c3,
	TRANSLATE('Aa%Bb%Cc''D%d', ' */''%', '_') as c4
FROM (VALUES (true)) AS t(f0);

returns:

c1 c2 c3 c4

Aa_Bb_CcD_d

Aa_Bb_CcD_d

Aa_Bb_CcD_d

Aa_Bb_CcD_d

14. User-Defined Functions

LeanXcale is extensible. One can define new functions using user code. There are several ways to define a function.

To implement a scalar function, there are 3 options:

  • Create a class with a public static eval method, and register the class;

  • Create a class with a public non-static eval method, and a public constructor with no arguments, and register the class;

  • Create a class with one or more public static methods, and register each class/method combination.

To implement an aggregate function, there are 2 options:

  • Create a class with public static init, add and result methods, and register the class;

  • Create a class with public non-static init, add and result methods, and a public constructor with no arguments, and register the class.

Optionally, add a public merge method to the class; this allows Calcite to generate code that merges sub-totals.

Optionally, make your class implement the SqlSplittableAggFunction interface; this allows Calcite to decompose the function across several stages of aggregation, roll up from summary tables, and push it through joins.

To implement a table function, there are 3 options:

To implement a table macro, there are 3 options:

  • Create a class with a static eval method that returns TranslatableTable, and register the class;

  • Create a class with a non-static eval method that returns TranslatableTable, and register the class;

  • Create a class with one or more public static methods that return TranslatableTable, and register each class/method combination.

LeanXcale infers the parameter types and result type of a function from the parameter and return types of the Java method that implements it. Further, you can specify the name and optionality of each parameter using the Parameter annotation.

14.1. Calling Functions with Named and Optional Parameters

Usually when you call a function, you need to specify all of its parameters, in order. But that can be a problem if a function has a lot of parameters, and especially if you want to add more parameters over time.

To solve this problem, the SQL standard allows you to pass parameters by name, and to define parameters which are optional (that is, have a default value that is used if they are not specified).

Suppose you have a function f, declared as in the following pseudo syntax:

FUNCTION f(
	INTEGER a,
	INTEGER b DEFAULT NULL,
	INTEGER c,
	INTEGER d DEFAULT NULL,
	INTEGER e DEFAULT NULL) RETURNS INTEGER

All of the functionโ€™s parameters have names, and parameters b, d and e have a default value of NULL and are therefore optional. In LeanXcale NULL is the only allowable default value for optional parameters.

When calling a function with optional parameters, you can omit optional arguments at the end of the list, or use the DEFAULT keyword for any optional arguments. Here are some examples:

  • f(1, 2, 3, 4, 5) provides a value to each parameter, in order;

  • f(1, 2, 3, 4) omits e, which gets its default value, NULL;

  • f(1, DEFAULT, 3) omits d and e, and specifies to use the default value of b;

  • f(1, DEFAULT, 3, DEFAULT, DEFAULT) has the same effect as the previous example;

  • f(1, 2) is not legal, because c is not optional;

  • f(1, 2, DEFAULT, 4) is not legal, because c is not optional.

You can specify arguments by name using the => syntax. If one argument is named, they all must be. Arguments may be in any other, but must not specify any argument more than once, and you need to provide a value for every parameter which is not optional. Here are some examples:

  • f(c => 3, d => 1, a => 0) is equivalent to f(0, NULL, 3, 1, NULL);

  • f(c => 3, d => 1) is not legal, because you have not specified a value for a and a is not optional.

15. Pattern Matching of CEP Events with MATCH_RECOGNIZE

MATCH_RECOGNIZE is a SQL extension for recognizing sequences of events in complex event processing (CEP).

The syntax is as follows:

matchRecognize=
	MATCH_RECOGNIZE '('
		[ PARTITION BY expression [, expression ]* ]
		[ ORDER BY orderItem [, orderItem ]* ]
		[ MEASURES measureColumn [, measureColumn ]* ]
		[ ONE ROW PER MATCH | ALL ROWS PER MATCH ]
		[ AFTER MATCH
			( SKIP TO NEXT ROW
			| SKIP PAST LAST ROW
			| SKIP TO FIRST variable
			| SKIP TO LAST variable
			| SKIP TO variable )
		]
		PATTERN '(' pattern ')'
			[ WITHIN intervalLiteral ]
			[ SUBSET subsetItem [, subsetItem ]* ]
		DEFINE variable AS condition [, variable AS condition ]*
	')'

subsetItem=
	variable = '(' variable [, variable ]* ')'

measureColumn=
	expression AS alias

pattern=
	patternTerm [ '|' patternTerm ]*

patternTerm=
	patternFactor [ patternFactor ]*

patternFactor=
	patternPrimary [ patternQuantifier ]

patternPrimary=
	variable
	|   '$'
	|   '^'
	|   '(' [ pattern ] ')'
	|   '{-' pattern '-}'
	|   PERMUTE '(' pattern [, pattern ]* ')'

patternQuantifier=
	'*'
	|   '*?'
	|   '+'
	|   '+?'
	|   '?'
	|   '??'
	|   '{' { [ minRepeat ], [ maxRepeat ] } '}' ['?']
	|   '{' repeat '}'

intervalLiteral=
	INTERVAL 'string' timeUnit [ TO timeUnit ]

In patternQuantifier, repeat is a positive integer, and minRepeat and maxRepeat are non-negative integers.

16. Copying Data Between Two Tables

Currently, the syntax CREATE TABLE …​ AS SELECT …​ is not supported. In order to copy data from one table to another, one first need to create the new table and then do INSERT/UPSERT …​ SELECT. This is an example:

CREATE TABLE TDEST (f1_key BIGINT, f2_value VARBINARY, PRIMARY KEY(f1_key));

INSERT INTO TDEST SELECT * FROM TORIG;

Additionally, the WITH TEMPORAL AS clause for using a temporal table can be used as in the following example:

CREATE TABLE Persons (
	ID int NOT NULL,
	NAME VARCHAR(20) NOT NULL,
	AGE int NOT NULL,
	SALARY double NOT NULL,
	PRIMARY KEY (ID)
);

INSERT INTO Persons (WITH TEMPORAL AS (SELECT 1, 'Juan', 10, cast(1550.1 as double)) SELECT * FROM temporal);

17. JDBC Function Escapes

JDBC function escapes are a mechanism provided by the JDBC API to invoke database-specific functions and procedures using standard SQL syntax within a Java application. JDBC function escapes are useful to use functions in Java code.

The syntax for JDBC function escapes involves using curly braces {} to enclose the call to the database-specific functions. The format typically looks like {fn functionName(argument1, argument2, …​)}. The following sections provide the different functions for different data types.

17.1. Numeric

These are the Numeric JDBC function escapes:

Operator syntax Description

{fn ABS(numeric)}

Returns the absolute value of numeric

{fn ACOS(numeric)}

Returns the arc cosine of numeric

{fn ASIN(numeric)}

Returns the arc sine of numeric

{fn ATAN(numeric)}

Returns the arc tangent of numeric

{fn ATAN2(numeric, numeric)}

Returns the arc tangent of the numeric coordinates

{fn CBRT(numeric)}

Returns the cube root of numeric

{fn CEILING(numeric)}

Rounds numeric up, and returns the smallest number that is greater than or equal to numeric

{fn COS(numeric)}

Returns the cosine of numeric

{fn COT(numeric)}

Returns the cotangent of numeric

{fn DEGREES(numeric)}

Converts numeric from radians to degrees

{fn EXP(numeric)}

Returns e raised to the power of numeric

{fn FLOOR(numeric)}

Rounds numeric down, and returns the largest number that is less than or equal to numeric

{fn LOG(numeric)}

Returns the natural logarithm (base e) of numeric

{fn LOG10(numeric)}

Returns the base-10 logarithm of numeric

{fn MOD(numeric1, numeric2)}

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

{fn PI()}

Returns a value that is closer than any other value to pi

{fn POWER(numeric1, numeric2)}

Returns numeric1 raised to the power of numeric2

{fn RADIANS(numeric)}

Converts numeric from degrees to radians

{fn RAND(numeric)}

Returns a random double using numeric as the seed value

{fn ROUND(numeric1, numeric2)}

Rounds numeric1 to numeric2 places right to the decimal point

{fn SIGN(numeric)}

Returns the signum of numeric

{fn SIN(numeric)}

Returns the sine of numeric

{fn SQRT(numeric)}

Returns the square root of numeric

{fn TAN(numeric)}

Returns the tangent of numeric

{fn TRUNCATE(numeric1, numeric2)}

Truncates numeric1 to numeric2 places right to the decimal point

17.2. String

These are the String JDBC function escapes:

Operator syntax Description

{fn ASCII(string)}

Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty

{fn CONCAT(character, character)}

Returns the concatenation of character strings

{fn INSERT(string1, start, length, string2)}

Inserts string2 into a slot in string1

{fn LCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to lower case

{fn LENGTH(string)}

Returns the number of characters in a string

{fn LOCATE(string1, string2 [, integer])}

Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified.

{fn LEFT(string, length)}

Returns the leftmost length characters from string

{fn LTRIM(string)}

Returns string with leading space characters removed

{fn REPLACE(string, search, replacement)}

Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed

{fn REVERSE(string)}

Returns string with the order of the characters reversed

{fn RIGHT(string, integer)}

Returns the rightmost length characters from string

{fn RTRIM(string)}

Returns string with trailing space characters removed

{fn SUBSTRING(string, offset, length)}

Returns a character string that consists of length characters from string starting at the offset position

{fn UCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to upper case

17.3. Date/time

These are the DATE and TIME JDBC function escapes:

Operator syntax Description

{fn CURDATE()}

Equivalent to CURRENT_DATE

{fn CURTIME()}

Equivalent to LOCALTIME

{fn NOW()}

Equivalent to LOCALTIMESTAMP

{fn YEAR(date)}

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

{fn QUARTER(date)}

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

{fn MONTH(date)}

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

{fn WEEK(date)}

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

{fn DAYOFYEAR(date)}

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

{fn DAYOFMONTH(date)}

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

{fn DAYOFWEEK(date)}

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

{fn HOUR(date)}

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

{fn MINUTE(date)}

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

{fn SECOND(date)}

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

{fn TIMESTAMPADD(timeUnit, count, datetime)}

Adds an interval of count timeUnits to a datetime

{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)}

Subtracts timestamp1 from timestamp2 and returns the result in timeUnits

17.4. System

These are the system JDBC function escapes:

Operator syntax Description

{fn DATABASE()}

Equivalent to CURRENT_CATALOG

{fn IFNULL(value1, value2)}

Returns value2 if value1 is null

{fn USER()}

Equivalent to CURRENT_USER

17.4.1. Conversion

These are the conversion JDBC function escapes:

Operator syntax Description

{fn CONVERT(value, type)}

Cast value into type

DML Triggers

A trigger is procedural code that is automatically executed in response to certain events on a particular table. DML triggers run when a user tries to modify data through a data manipulation language (DML) event. They can be used to enforce business rules and data integrity, query other tables, and include complex statements. DML events are INSERT, UPDATE, or DELETE statements on a table.

CREATE TRIGGER myTrigger BEFORE INSERT UPDATE ON t
FOR EACH ROW EXECUTE genericBackupTrigger('t_backup,ts')
PRIORITY 10;

DISABLE TRIGGER myTrigger ON t;

ENABLE TRIGGER myTrigger ON t;

DROP TRIGGER myTrigger ON TABLE t;

1. CREATE TRIGGER Statement

The syntax to create a trigger is as follow:

createTrigger:
      CREATE TRIGGER [ IF NOT EXISTS ] triggerName
      (BEFORE | AFTER)
      [INSERT] [UPDATE] [DELETE]
      ON tableName FOR EACH ROW EXECUTE
      triggerFunction ['(' arg ')']
      [PRIORITY priority]

where

  • triggerName is the name used by the user to identify the trigger.

  • tableName is the table where the trigger applies

  • triggerFunction is the Trigger Function name as the result of getTriggerName(). See Trigger Function interface

  • arg is the trigger function argument as string literal passed in triggerExecutor(). Check [triggerExecutor params]

  • priority is an integer literal used to order teh trigger execution in case that the table have more than one. Note that the resulting new values from one trigger will be passed to the execution of the next one.

Note that you can define 1 to 3 dml events among INSERT, UPDATE and DELETE.

2. Trigger Functions System Table

To find out the loaded trigger functions, you can query TRIGGER_FUNCTIONS system table.

> select * from LXSYSMETA.TRIGGER_FUNCTIONS;
+---------------------------+-------------------------------------------------------------------------+
|        triggerName        |                               description                               |
+---------------------------+-------------------------------------------------------------------------+
| HashFieldTrigger          | Hash Field Trigger                                                      |
| UUIDTrigger               | UUID Trigger                                                            |
| AutoGeohashTrigger        | Auto Geohash Trigger                                                    |
| ForceAutoIncrementTrigger | Force Auto Increment Trigger                                            |
| AutoIncrementTrigger      | Auto Increment Trigger                                                  |
| DefaultValueTrigger       | Default Value Trigger                                                   |
| DeleteForeignKeyTrigger   | Check for Referential Integrity when deleting row from referenced table |
| ForeignKeyTrigger         | Check for Referential Integrity                                         |
+---------------------------+-------------------------------------------------------------------------+

3. Custom Trigger Functions

Trigger functions are java objects that are automatically loaded into the Query Engine and provide a mechanism to build the Trigger’s executors, that holds the actual trigger procedural code. In order to create a custom trigger function, you need to implement from TriggerFunctionI Interface and define the following methods:

ResultType Function Description

String

getTriggerName()

Define the trigger function name

String

description()

Define a description for the trigger function so users can figure out what it does

Object

buildArguments(String arg)

Build an instance of argumntes from freestyle string. The instance is passed to triggerExecutor method.

TriggerExecutor

triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object args)

Creates an istance of TriggerExecutor. See Trigger Executor Method

public class GenericBackupTriggerExample implements TriggerFunctionI {
  static String triggerName = "genericBackupTrigger";

  @Override
  public String getTriggerName() {
    return triggerName;
  }

  @Override
  public Object buildArguments(String arg) {
    return arg.split(",");
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    String[] args = (String[]) arg;
    return new GenericBackupTriggerExecutor(triggerUtils, newValues, oldValues, args[0], args[1]);
  }

  @Override
  public String description() {
    return "Generic backup Trigger example";
  }
}

3.1. Method triggerExecutor

The method triggerExecutor builds an instance of TriggerExecutor accordingly to the given params.

Param position Param type Description

1

TriggerUtils

Utils. See Trigger Utils

2

TypedNamedValuesI

New values for table’s row. Null on deletion

3

TypedNamedValuesI

Old values for table’s row. Null on insertion

4

String

Trigger argument

3.2. Custom triggerExecutor

Trigger’s executors are the ones that actually holds the procedural code to be executed when a trigger is fired. You would just need to extend from TriggerExecutor abstract class and define the following methods:

ResultType Function Description

void

execute()

Executes the procedural code

In order to implement the trigger execution code, you might probably need to use the following final methods:

ResultType Function Description

boolean

hasValues()

Tell if the new values are available

boolean

hasOldValues()

Tell if the old values are available

ExpressionType

getType(String column)

Get the column’s type

Object

getValue(String column)

Get new value for column

Object

getOldValue(String column)

Get old value for column

void

setValue(String column, Object value)

Set new value for column

String[]

valuesFieldNames()

Get values field names

Note that TriggerExecutor abstract class constructor has the following parameters:

Param position Param type Description

1

TypedNamedValuesI

New values for table’s row

2

TypedNamedValuesI

Old values for table’s row

  static public class GenericBackupTriggerExecutor extends TriggerExecutor {

    public final String backupTableName;
    public final String tsColumnName;
    public final TriggerUtils triggerUtils;

    public GenericBackupTriggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, String backupTableName, String tsColumnName) {
      super(newValues, oldValues);
      this.backupTableName = backupTableName;
      this.tsColumnName = tsColumnName;
      this.triggerUtils = triggerUtils;
    }

    @Override
    public String getExecutorName() {
      return triggerName;
    }

    @Override
    public void execute() {
      if (!hasValues()) { // delete
        return;
      }
      try (Session session = triggerUtils.session()) {
        Table table = session.database().getTable(backupTableName);
        Tuple tuple = table.createTuple();
        for (String column : valuesFieldNames()) {
          tuple.put(column, getValue(column));
        }
        tuple.put(tsColumnName, Timestamp.valueOf(LocalDateTime.now()));
        table.insert(tuple);
        // no need to commit when sharing QE's transaction
      } catch (Exception exception) {
        throw new LeanxcaleRuntimeException(exception);
      }
    }
  }

4. Trigger Utils

An instance of trigger utils provides connection information and some other functionalities that could be useful in order to implemet a trigger like creating a no-SQL direct session, get the next or current value of a sequence, generate a uuid, etc. Find below the available methods:

ResultType Function Description

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID

Connection

getConnection()

Get current JDBC connection

Session

session(Settings Settings)

Creates a new session to a Kivi database. It shares the current kivi connection.

Session

session()

Creates a new session to a Kivi database. It shares the current kivi connection and transaction to connect through the direct API.

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

As you might realize, TableFunctionUtils is an extended implementation of TriggerUtils, see Table Function Utils. You could use TableFunctionUtils within trigger’s code but we strongly recommend to use the given TriggerUtils instance so the trigger can be fired by no-SQL API events too.

5. Expose a Trigger as a Table Function

For developments environments, it could be interesting to execute a table funtion in order to test a trigger behavior. In order to do this, we provide an abstract class TriggerTableFunction that will save you from do it yourself. As you can see in the following example, you would just need to add the definitions of the table’s function methods getFunctionName() and eval(..).

public class BackupTriggerExample extends TriggerTableFunction {

  static String triggerName = "backupTrigger";
  static String[] columnNames = {"PK1", "PK2", "F3", "F4"};

  public static String getFunctionName() {
    return triggerName;
  }

  @Override
  public String getTriggerName() {
    return getFunctionName();
  }

  @Override
  public Object buildArguments(String s) {
    return s;
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    return new BackupTriggerExecutor(newValues, oldValues);
  }

  @Override
  public String description() {
    return "Trigger example";
  }

  public ScannableTable eval(Long pk1, String pk2, String f3, Integer f4) {
    Object[] values = {pk1, pk2, f3, f4};
    int[] types = {Types.BIGINT, Types.VARCHAR, Types.VARCHAR, Types.INTEGER};
    NamedValuesI newValues = new TypedNamedValuesImpl(columnNames, values, types);
    return evalTrigger(newValues, null, buildArguments(null));
  }
}

Don’t forget to return the evalTrigger(NamedValuesI newValues, NamedValuesI oldValues, String args) result so the trigger is executed and its result is returned by the table function. Note that evalTrigger params will be mapped to the triggerExecutor ones, and it will invoke buildArguemnts.

 try (ResultSet rs = s.executeQuery("exec backupTrigger(1, 'a', 'af3', null)")) {
        assertThat(rs.next(), is(true));
        assertThat(rs.getLong(1), is(1L));
        assertThat(rs.getString(2), is("a"));
        assertThat(rs.getString(3), is("af3"));
        assertThat(rs.getInt(4), is(100));
        assertThat(rs.next(), is(false));
      }

5.1. Typed Named Values

The TypedNamedValuesI interface provides a interface to access a value by name. This is the following:

ResultType

Function

Description

ExpressionType

getType(String name)

Get the column’s type

Object

getValue(String name)

Get the value for name. Null when not found or value is null

void

setValue(String name, Object value)

Set the value for name

String[]

getFieldNames()

Get field names

Object[]

getValues()

Get values ordered as getFieldNames()

boolean

contains(String name)

Tells if name exists

int

size()

Tells size of the values

String

sqlTableName()

Table’s name the values belong to. Null when unknown

Anyway, we provide a basic TypedNamedValuesI implementation that might satisfy your needs.

  /**
   * Create a TypedNamesValues
   * @param positionsMap Map value position by name
   * @param row values
   * @param typeMap Map SQL type by name
   */
  public TypedNamedValuesImpl(Map<String, Integer> positionsMap, Object[] row, Map<String, ExpressionType> typeMap) {
    ...
  }

  /**
   * Create a NamedValues object
   * @param names values names
   * @param row values ordered by names
   * @param types  SQL types ordered by names
   */
  public TypedNamedValuesImpl(String[] names, Object[] row, int[] types) {
    ...
  }

5.2. Expression Type

The ExpressionType interface provides information about a table’s column or an expression. This is the following:

ResultType

Function

Description

int

sqlType()

Expression’s SQL JDBC type

int

precision()

Expression’s SQL JDBC type precision

int

scale()

Expression’s SQL JDBC type scale

boolean

isNullable()

Tells if the expression is nullable

6. System Triggers

Despite of custom triggers, we provide some system triggers that could be used directly by the user.

6.1. Auto Increment Trigger

This trigger is created automatically for a table when using the IDENTITY clause on table’s creation or alteration. Its execution fills the desired column with sequence’s next val when it is null. Its argument is a list of strings separated by coma. Find below their meaning:

Argument position

Description

1

Name of the column to be assigned with sequence’s next val when null

2

Sequence name. We suggest to include the schema name so the trigger won’t fail when connected to a different schema

The following table’s creation is equivalent to the one below:

create table tauto (i int AS IDENTITY start with 10 increment by 10, v VARCHAR, PRIMARY KEY (i));
create table tauto (i int, v VARCHAR, PRIMARY KEY (i));

create sequence tauto_i_seq start with 10 increment by 10;

CREATE TRIGGER tauto_i_seq_trigger BEFORE INSERT ON tauto
FOR EACH ROW EXECUTE autoIncrementTrigger('i,tauto_i_seq')
PRIORITY 100;

-- Note that the autoincrement colums will become an implicit column for inserts
insert into tauto values ('value');

6.2. Geohash Trigger

This trigger is created automatically for a table when using the PRIMARY GEOHASH KEY or GEOHASH clauses on table’s creation. Hidden geohash fields are created too when using these clauses. The trigger execution fills the desired column with the calculated geohash when it is null. Its argument is a list of strings separated by coma. Find below their meaning:

Argument position

Description

1

Name of the column to be assigned with the calculated geohash

2

WKT field if there is only two arguments, or Latitude field if there is three

3

Longitude field if there is three arguemnts

The following table’s creation is equivalent to the one below:

create table t (wkt VARCHAR, latitude double, longitude double, PRIMARY GEOHASH KEY (wkt), GEOHASH (latitude,longitude));
create table t (wkt VARCHAR, latitude double, longitude double, gh_wkt VARCHAR PRIMARY KEY, gh_lat_long VARCHAR);

CREATE TRIGGER gh_wkt_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_wkt,wkt')
PRIORITY 100;

CREATE TRIGGER gh_lat_long_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_lat_long,latitude,longitude')
PRIORITY 100;

-- Note that GEOHASH clause will create and index too, so we create it here manually
CREATE INDEX gh_lat_long_index ON t (gh_lat_long);

-- Note that the geohash column will become an implicit column for inserts and selects
insert into t values (ST_asText(ST_MakePoint(3.41,40.25)),cast(40.25 as double), cast(3.41 as double));

Geohash fields are fields indexed geographically for GIS applications

6.3. UUID Trigger

This trigger is created automatically for a table when not defining a PRIMARY KEY. A hidden UUID column is created too. The trigger execution fills the desired column with an UUID valuen when it is null. Its argument is Name of the column to be assigned with uuid.

The following table’s creation is equivalent to the one below:

create table tauto (v VARCHAR);
create table t (id VARCHAR, v VARCHAR, PRIMARY KEY (id));

CREATE TRIGGER uuid_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE UUIDTrigger('id')
PRIORITY 100;

-- Note that the uuid column will become a implicit column for inserts and selects
insert into t values ('value');

6.4. Referential Integrity Triggers

Referential integrity is implemented as well using triggers. Every time a Foreign key constraint is added to a table, two triggers are added to the table and one to the referenced table. They are disabled by default. You should enable them to activate referential checks.

> create table tpk (i int, v VARCHAR, i2 int, PRIMARY KEY (i));

> create table tfk (pk int, v VARCHAR, fk int, PRIMARY KEY (pk), CONSTRAINT fKi2 FOREIGN KEY (fk) REFERENCES tpk(i));

> ENABLE TRIGGER fKi2 ON tfk;

> ENABLE TRIGGER fKi2 ON tpk;

> select * from LXSYSMETA.TRIGGERS;
triggerName      FKI2
triggerFunction  DeleteForeignKeyTrigger
args             {"fkTableName":"APP.TFK","columns":["I"],"fkColumns":["FK"]}
isBefore         true
isInsert         false
isUpdate         false
isDelete         true
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TPK

triggerName      FKI2_NO_CONFLICT
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":false,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         false
tableCat         db
tableSchem       APP
tableName        TFK

triggerName      FKI2
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":true,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TFK

The two triggers added to the table use the same trigger function ForeignKeyTrigger but with different params. One will only check that the referenced key exist and the other will create a conflict on it too. Both will be triggered on insertion. You would only need enable the one you need.

The trigger on the referenced table DeleteForeignKeyTrigger would prevent from deleting a row whose primary key is referenced by the table that holds the foreing key. Note that an index with same constraint name is created along with the foreign key’s triggers to improve DeleteForeignKeyTrigger performance.

Table Functions

A table function refers to a user-defined function that returns a table or a set of rows as its result. Table functions are used to encapsulate logic and calculations, allowing you to treat the result of the function as if it were a table in your SQL queries.

1. How to Develop and Deploy a Table Function

First you need to add the LeanXcale maven repository and the qe-tableFunctions dependency to your pom.xml file:

<repositories>
    <repository>
        <id>maven-releases</id>
        <url>https://nexus.leanxcale.com/repository/maven-releases</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>com.leanxcale</groupId>
        <artifactId>qe-tableFunctions</artifactId>
        <version>1.7.6</version>
    </dependency>
</dependencies>

In order to create a custom table function, a Java class has to be written implementing the AutoloadTableFunction interface and defining the following methods:

Function Description Return

getFunctionName()

Define the custom table function name

String

getDescription()

Provides the custom table function description

String

eval(<table function arguments>)

Generate the collection of rows to be queried

ScannableTable

You can define as many eval functions as you need as long as their signatures are different. In order to write the eval functions returning ScannableTable, the following methods should be implemented:

  • getRowType

    • RelDataType getRowType(RelDataTypeFactory typeFactory)

    • Description:

      • Returns this table’s row type. This is a struct type whose fields describe the names and types of the columns in this table. The implementer must use the type factory provided. This ensures that the type is converted into a canonical form; other equal types in the same query will use the same object.

    • Parameters:

      • typeFactory: Type factory with which to create the type.

    • Returns:

      • Row type

  • getStatistic

    • Statistic getStatistic()

    • Description:

      • Statistic getStatistic()

    • Returns:

      • Statistic

  • getJdbcTableType

    • Description:

      • Returns the type of table.

    • Returns:

      • Table type.

  • sRolledUp

    • Description:

      • Determines whether the given column has been rolled up.

    • Returns:

      • true if the column is rolled up.

  • rolledUpColumnValidInsideAgg

    • Description:

      • Determines whether the given rolled up column can be used inside the given aggregate function. You can assume that isRolledUp(column) is true.

    • Parameters:

      • column - The column name for which isRolledUp is true.

      • call - The aggregate call.

      • parent - Parent node of call in the SqlNode tree.

      • config - Config settings. May be null.

    • Returns:

      • true if the given aggregate call is valid.

All the table function classes must belong to the package com.leanxcale in order to have them auto-loaded when LeanXcale starts.

Find below a complete example:

package com.leanxcale;

import com.leanxcale.QE.tableFunction.AutoloadTableFunction;
import org.apache.calcite.DataContext;
import org.apache.calcite.config.CalciteConnectionConfig;
import org.apache.calcite.linq4j.Enumerable;
import org.apache.calcite.linq4j.Linq4j;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.schema.ScannableTable;
import org.apache.calcite.schema.Schema;
import org.apache.calcite.schema.Statistic;
import org.apache.calcite.schema.Statistics;
import org.apache.calcite.sql.SqlCall;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.type.SqlTypeName;
import org.apache.calcite.util.ImmutableBitSet;
import java.util.Collections;

public class SeriesGeneratorTableFunction implements AutoloadTableFunction {

  public static String getFunctionName() {
    return "series_generator";
  }

  public ScannableTable eval(Integer start, Integer stop, Integer step) {
    int actualStep = (null == step) ? 1 : step;
    int numResults = (1 + (stop - start)/actualStep);
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        Object[][] results = new Object[numResults][1];
        for (int inx = 0; inx < numResults; inx++) {
          results[inx][0] = start + inx*actualStep;
        }
        return Linq4j.asEnumerable(results);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        return typeFactory.builder()
                .add("SERIES", SqlTypeName.INTEGER)
                .build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(numResults, Collections.singletonList(ImmutableBitSet.of(numResults)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

Once you do mvn package, a .jar file gets created with the custom table functions you defined. This .jar should be included in the Query Engine’s classpath (lxs/LX-BIN/lib). The Query Engine should be restarted after adding the table function .jar file. For the cloud version, the leanxcale console will provide the means to add the table function code to your database.

2. Using Table Functions

You can use a table function in the FROM clause of a query as the first argument to the function table. In this example, series_generator is the table function:

SELECT SERIES FROM table(series_generator(1, 10, 1)) WHERE SERIES > 3;

+--------+
| SERIES |
+--------+
| 4      |
| 5      |
| 6      |
| 7      |
| 8      |
| 9      |
| 10     |
+--------+
7 rows selected (0.008 seconds)

Take into account that if the parameter you’re passing to your table function is a string, you’ll need to use single quotes. Double quotes are not allowed. This is an example:

SELECT * FROM table(my_new_tablefunction('test_string'));

But this does not work:

SELECT * FROM table(my_new_tablefunction("test_string"));

3. Table Function Utils

In order to ease the writing of table function, you can use the TableFunctionUtils class. You will be able to execute SQL queries within your Table Function.

ResultType Function Description

TableFunctionUtils

utils()

Static function to build an instance for the table function utils according to the current connection

Connection

getConnection()

Provide an connection to perfom SQL actions. This connection cannot be committed or rollbacked

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID ////

Session

session(Settings Settings)

Creates a new session to the KiVi database. It shares the current KiVi connection.

Session

session()

Creates a new session to the KiVi database. It shares the current KiVi connection and transaction to connect through the direct API. ////

Table

getTable(String tableName)

Provide a table object that exist within current transaction context so you can perform basic operations on it

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

Find below an example:

public class TableFunctionUtilsExample implements AutoloadTableFunction {
  public static final String functionName = "TableFunctionUtilsExample";

  public static String getFunctionName() {
    return functionName;
  }

  public static String getDescription() {
    return functionName;
  }

  public ScannableTable eval(Object parentTableName, Object sonTableName) {
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        List<Object[]> result = new ArrayList<>();

        TableFunctionUtils utils = TableFunctionUtils.utils();
        Settings settings = new Settings();
        settings.setDatabase(utils.getDbName());
        settings.setSchema(utils.getSchemaName());
        settings.disableConflictChecking();
        settings.disableLogging();
        try (Session session = utils.session(settings)) {
          Table table = session.database().getTable((String) parentTableName);
          try (ResultSet rs = utils.getConnection().createStatement().executeQuery("select i_name, max_price from " + sonTableName)) {
            while (rs.next()) {
              Tuple tuple = table.find()
                  .filter(Filters.eq("i_name", rs.getString(1)).and(Filters.eq("i_price", rs.getInt(2))))
                  .iterator().next();
              Object[] row = new Object[3];
              row[0] = tuple.get("i_id");
              row[1] = tuple.get("i_name");
              row[2] = tuple.get("i_price");
              result.add(row);
            }
          }
        } catch (Exception exception) {
          throw new LeanxcaleRuntimeException(exception);
        }

        return Linq4j.asEnumerable(result);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        RelDataTypeFactory.Builder builder = typeFactory.builder();
        builder.add("ID", SqlTypeName.INTEGER);
        builder.add("NAME", SqlTypeName.VARCHAR);
        builder.add("PRICE", SqlTypeName.INTEGER);
        return builder.build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(1, Collections.singletonList(ImmutableBitSet.of(1)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

GIS/Spatial Support

LeanXcale’s GIS support and LeanXcale’s high performance over GIS data is based a search algorithm based on Geohashingh. Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer and G.M. Morton that encodes a geographic location into a short string of letters and digits. Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size with a gradual lose of precision. LeanXcale leverages this gradual precision to perform ST geometric and geographic operations over GIS data following the Geohash property that lies in nearby places will often present similar prefixes. In particular, LeanXcale’s geometry predicates and operators exercise this property.

In this section, we will follow the following conventions:

In the โ€œCโ€ (for โ€œcompatibilityโ€) column, โ€œoโ€ indicates that the function implements the OpenGIS Simple Features Implementation Specification for SQL, version 1.2.1, while โ€œpโ€ indicates that the function is a PostGIS extension to OpenGIS.

1. Geohash Indexes

LeanXcale can speed up queries over geospatial data by using Geohash indexing. Geohash is an encoding geographic location on Base 32 where the even bits represent the longitude precision and the odd bits represent the latitude precision. Geohash has some interesting properties so from an envelop you can narrow down the bins in which you have to look for geometries.

More information can be found in the following links:

The geohash index can be the primary key of a table or a secondary. In general the primary key performs better and is the preferred way to define a geom table.

To create a table with geohash you need the following syntax:

  • Create a table whose primary key is a Geohash key. This will create a hidden geohash field in your table that will be used as primary key

CREATE TABLE
	geohashedcountries(
		 countrycode VARCHAR,
		 geomLocation VARCHAR,
		 name VARCHAR,
	PRIMARY GEOHASH KEY(geomLocation));
  • Create the table, this time with a secondary index for the geohash. Again a hidden geohash table will be used:

CREATE TABLE
	geohashedcountries(
		 countrycode VARCHAR,
		 geomLocation VARCHAR,
		 name VARCHAR,
	PRIMARY KEY(name),
	GEOHASH(geomLocation));

Instead of a geometry field which can be an arbitrary geometry, you may use two fields that will be used as latitude and longitude:

CREATE TABLE geohashedcitypoints(
  citycode VARCHAR,
  latitude DOUBLE,
  longitude DOUBLE,
  name VARCHAR,
  PRIMARY GEOHASH KEY(latitude, longitude));

When running a geo query, LeanXcale’s query optimizer will automatically detect the Geohash index and use it to narrow down your search:

EXPLAIN PLAN FOR (
	SELECT name FROM geohashedcountries
	WHERE ST_Contains(
		ST_Buffer(ST_MakePoint(cast(-3.67 as double), cast(40.42 as double)), cast(0.5 as double)),
			ST_GeomFromText(geomLocation)
  ));

PLAN=EnumerableCalc(expr#0..2=[{inputs}], expr#3=[-3.67:DECIMAL(19, 0)], expr#4=[40.42:DECIMAL(19, 0)], expr#5=[ST_MAKEPOINT($t3, $t4)], expr#6=[0.5:DOUBLE], expr#7=[ST_BUFFER($t5, $t6)], expr#8=[ST_GEOMFROMTEXT($t0)], expr#9=[ST_CONTAINS($t7, $t8)], NAME=[$t1], $condition=[$t9])
            KiviPKTableScanRel(table=[[leanxcale, ADHOC, GEOHASHEDCOUNTRIES, filter:ST_GH_in($2, ST_GH_minMaxRegions(ST_BUFFER(ST_MAKEPOINT(-3.67:DOUBLE(19, 0), 40.42:DOUBLE(19, 0)), 0.5:DOUBLE))), project:[1, 2, 3]]], project=

Most frequently, all the functionality for geohashing is done internally so there is no need to explicitly define any condition for geohash. Anyway, the following functions are supported to query through the Geohash indexes:

ResultType Function Description

boolean

ST_GH_in(String geohash, String[][] minMaxRegions)

Check if a string is between any region of the list

String[][]

ST_GH_minMaxRegions(Geom geom)

Calculate the min max geohash regions(bins) the geom belongs to

String[]

ST_GH_encodeGeom(Geom geom)

Encodes the given geometry into a list of geoHash that contains it. The first item in the list would be the center

String

ST_GH_minGH(String geohash)

Generate the min value in geohash region

String

ST_GH_maxGH(String geohash)

Generate the max value in geohash region

String

ST_GH_encodeLatLon(double latitude, double longitude, int precision)

Encodes the given latitude and longitude into a geohash with the indicated precision (number of characters==number of 5bits groups)

String

ST_GH_encodeLatLon(double latitude, double longitude)

Encodes the given latitude and longitude into a geohash. Default precision is 12

String

ST_GH_encodeFromBinaryString(String binaryString)

Encodes the given binary string into a geohash.

String

ST_GH_encodeFromLong(long hashVal, int precision)

Encodes the given long into a geohash.

Double[]

ST_GH_decode(String geoHash)

Decodes the given geohash into a latitude and longitude

long

ST_GH_decodeToLong(String geoHash)

Decodes the given geoHash into bits as long value

String

ST_GH_decodeToBinaryString(String geoHash)

Decodes the given geoHash into a binary string

String

ST_GH_adjacent(String geoHash)

Returns the 8 adjacent hashes in the following order: N, NE, E, SE, S, SW, W, NW

String

ST_GH_regionsWithinDistance(double latitude, double longitude, double distance)

Returns the hashes that include the points within the specified distamce

String[]

ST_GH_regionsWithinDistance(Geom geom, Object distance)

Returns the hashes that include the points within the specified distance from the Geom.

String

ST_GH_northernNeighbour(String geoHash)

Returns the immediate neighbouring hash to the north

String

ST_GH_southernNeighbour(String geoHash)

Returns the immediate neighbouring hash to the south

String

ST_GH_westernNeighbour(String geoHash)

Returns the immediate neighbouring hash to the west

String

ST_GH_easternNeighbour(String geoHash)

Returns the immediate neighbouring hash to the east

Double[]

ST_GH_boundingBox(String geoHash)

Return the list of geohash limits for: min Latitude, min Longitude, max Latitude, max Longitude

2. Geometry Conversion Functions (2D)

C Operator syntax Description

p

ST_AsText(geom)

Alias for ST_AsWKT

o

ST_AsWKT(geom)

Converts geom โ†’ WKT

o

ST_GeomFromText(wkt [, srid ])

Returns a specified GEOMETRY value from WKT representation

o

ST_LineFromText(wkt [, srid ])

Converts WKT โ†’ LINESTRING

o

ST_MLineFromText(wkt [, srid ])

Converts WKT โ†’ MULTILINESTRING

o

ST_MPointFromText(wkt [, srid ])

Converts WKT โ†’ MULTIPOINT

o

ST_MPolyFromText(wkt [, srid ])

Converts WKT โ†’ MULTIPOLYGON

o

ST_PointFromText(wkt [, srid ])

Converts WKT โ†’ POINT

o

ST_PolyFromText(wkt [, srid ])

Converts WKT โ†’ POLYGON

3. Geometry Creation Functions (2D)

C Operator syntax Description

o

ST_MakeLine(point1 [, point ]*)

Creates a line-string from the given POINTs (or MULTIPOINTs)

p

ST_MakePoint(x, y [, z ])

Alias for ST_Point

o

ST_Point(x, y [, z ])

Constructs a point from two or three coordinates

4. Geometry Properties (2D)

C Operator syntax Description

o

ST_Boundary(geom [, srid ])

Returns the boundary of geom

o

ST_Distance(geom1, geom2)

Returns the distance between geom1 and geom2

o

ST_GeometryType(geom)

Returns the type of geom

o

ST_GeometryTypeCode(geom)

Returns the OGC SFS type code of geom

o

ST_Envelope(geom [, srid ])

Returns the envelope of geom (which may be a GEOMETRYCOLLECTION) as a GEOMETRY

o

ST_X(geom)

Returns the x-value of the first coordinate of geom

o

ST_Y(geom)

Returns the y-value of the first coordinate of geom

5. Geometry Properties (3D)

C Operator syntax Description

p

ST_Is3D(s)

Returns whether geom has at least one z-coordinate

o

ST_Z(geom)

Returns the z-value of the first coordinate of geom

6. Geometry Predicates

C Operator syntax Description

o

ST_Contains(geom1, geom2)

Returns whether geom1 contains geom2

p

ST_ContainsProperly(geom1, geom2)

Returns whether geom1 contains geom2 but does not intersect its boundary

o

ST_Crosses(geom1, geom2)

Returns whether geom1 crosses geom2

o

ST_Disjoint(geom1, geom2)

Returns whether geom1 and geom2 are disjoint

p

ST_DWithin(geom1, geom2, distance)

Returns whether geom1 and geom are within distance of one another

o

ST_EnvelopesIntersect(geom1, geom2)

Returns whether the envelope of geom1 intersects the envelope of geom2

o

ST_Equals(geom1, geom2)

Returns whether geom1 equals geom2

o

ST_Intersects(geom1, geom2)

Returns whether geom1 intersects geom2

o

ST_Overlaps(geom1, geom2)

Returns whether geom1 overlaps geom2

o

ST_Touches(geom1, geom2)

Returns whether geom1 touches geom2

o

ST_Within(geom1, geom2)

Returns whether geom1 is within geom2

7. Geometry Operators (2D)

The following functions combine 2D geometries.

C Operator syntax Description

o

ST_Buffer(geom, distance [, quadSegs | style ])

Computes a buffer around geom

o

ST_Union(geom1, geom2)

Computes the union of geom1 and geom2

o

ST_Union(geomCollection)

Computes the union of the geometries in geomCollection

See also: the ST_Union aggregate function.

8. Geometry Projection Functions

C Operator syntax Description

o

ST_SetSRID(geom, srid)

Returns a copy of geom with a new SRID

o

ST_Transform(geom, srid)

Transforms geom from one coordinate reference system (CRS) to the CRS specified by srid

Statistics

Statistics are required to have good performance on the queries. These statistics are used by the SQL planner to choose the optimal execution plan for an SQL query and the current data. The syntax is:

analyzeStatistics=
ANALYZE STATISTICS FOR
	[TABLE table_name]
	| [ALL TABLES];

This process scans the full table and generates internal statistics that are persisted. The length of this process depends on the size of the table since it iterates the whole table, row per row. We recommend it to do it periodically and after a massive data ingestion. A rule of thumb is when the accumulated changes since the last running of the statistics reach 10% of the data then the statistics should be run.

Hints

A hint is an instruction to the optimizer. When writing SQL, you may know information about the data that is unknown to the optimizer. Sometimes the optmizer might not generate the most optimal query plan. Hints enable to change aspects of the query plan generated by the optimizer.

There are two ways to define a hint:

  • Inline hints

  • Context hints

1. Inline Hints

Inline hints are defined as special comments. Depending on the hint type, it can be written in two locations:

  • Query Hint: right after the SELECT keyword.

  • Table Hint: right after the referenced table name.

Here you can see the kinds of hint:

select /*+ fixJoinOrder */ sum(a.i) from t1 /*+ forceAccess(idx1) */ a, t2 b where a.v = b.v;

The syntax is as follows:

hintComment:
	'/*+' hint [, hint ]* '*/'

hint:
	hintName
	| hintName '(' optionKey '=' optionVal [, optionKey '=' optionVal ]* ')'
	| hintName '(' hintOption [, hintOption ]* ')'

optionKey:
	simpleIdentifier
	| stringLiteral

optionVal:
	stringLiteral

hintOption:
	simpleIdentifier
 	| numericLiteral
	| stringLiteral

2. Query Hints

2.1. Force the Use of an Index

This hint forces the access to a table through a defined index. The hint name is forceAccess. It takes two parameters:

  • Table name.

  • Index name.

This is an example forcing to use index idx1 to access table t1:

select /*+ forceAccess(t1, idx1) */ sum(i) from t1 a
	where a.i between 1 and 5 and a.i2 between 11 and 20;

2.2. Disable Pushdown

Queries in LeanXcale can be executed either on the query engine, in the storage engines, or both, depending on the query. The part of the query executed in the storage engines is said to be pushed down to the storage engines. Sometimes one might want to prevent pushing down operations to storage engines for a given table in a query. In general, this will negatively impact performance, so only use it, if you actually know what you are doing. This means that all the work will be done by the query engine except for the scans.

This is an example that disables the pushdown for table t1:

select /*+ disablePushdown(t1) */ sum(i) from t1 a where a.i
	between 1 and 5 and a.i2 between 11 and 20;

2.3. Disable Join Pushdown

This hint disables the use of relops query engine. In general, this will negatively impact performance. This means that all join actions will be executed by the Java query engine instead of relops (the C query engine).

This is an example:

select /*+ skip_relops */ sum(a.i) from t1 a, t2 b where a.v = b.v

2.4. Skip Online Aggregate

This hint disables translating table’s query into a query over any of its online aggregates. In general, this will negatively impact performance.

select /*+ skip_oa(item) */ max(i_price), min(i_price), count(*) from item;

2.5. Fix Join Order

This hint sets the order of the joins as they are written in the query. This hint works at query level, so it will affect to all the joins present in the query.

This is an example:

select /*+ fixJoinOrder */ sum(a.i) from t1 a, t2 b where a.v = b.v

2.6. Force Join Type

This hint allows to force the kind of join which will be used between two given tables. This is an example:

select /*+ forceJoinType(NESTED_LOOP, t1, t2) */ sum(a.i) from t1 a, t2 b where a.v = b.v;

Where JOIN_TYPE is one of the available types:

  • CORRELATE

  • MERGE_JOIN

  • NESTED_LOOP

2.7. Or To Union

This hint transforms OR conditions into UNION clauses. This is an example:

select /*+ orToUnion(t1) */ sum(i) from t1
	where i between 1 and 3 and v is not null or i between 18 and 20;

2.8. Or To Union All

This hints transforms OR conditions into UNION ALL clauses. This is an example:

select /*+ orToUnionAll(t1) */ sum(i) from t1
	where i between 1 and 3 and v is not null or i between 18 and 20;

3. Table Hints

3.1. Force the Use of an Index

This hint forces the access to a table through a defined index. To use it invoke the function as follows: An example is:

select sum(i) from t1 /*+ forceAccess(t1) */ a
	where a.i between 1 and 5 and a.i2 between 11 and 20;

3.2. Disable Pushdown

This hint disables the pushdown of operations to the storage engine for a given table. In general, this will negatively impact performance. This means that all the work will be done by the query engine except the scans. This is an example:

select sum(i) from t1  /*+ disablePushdown */ a
	where a.i between 1 and 5 and a.i2 between 11 and 20;

3.3. Skip Online Aggregate

This hint disables translating table’s query into a query over any of its online aggregates. This is an example:

select max(i_price), min(i_price), count(*) from item /*+ skip_oa */;

3.4. Or to Union

This hints transforms OR conditions into UNION clauses. This is an example:

select sum(i) from t1 /*+ orToUnion */
	where i between 1 and 3 and v is not null or i between 18 and 20;

3.5. Or to Union All

This is an example: This hints transforms OR conditions into UNION ALL clauses.

select sum(i) from t1 /*+ orToUnionAll */
	where i between 1 and 3 and v is not null or i between 18 and 20;

3.6. Batch Size

This hint applies only to DMLs and is equivalent to BATCH statements. The following statements are equivalent:

upsert into t2 /*+ batchSize(5000) */ (pk, v, fk)
	select i2, v, i from t1 where v is not null;
batch 5000 upsert into t2 (pk, v, fk)
	select i2, v, i from t1 where v is not null;

4. Context Hints

Context Hints are established within a session for a given connection and affect all queries issued in that session after being set. The actions that can be performed for context hints are:

  1. Enable context hints for the connection

  2. Disable context hints for the connection

  3. List a context hint

  4. Refresh plan cache

  5. Set a context hint

In the following subsections we explain the different steps.

4.1. Enable Context Hints

In order to allow to use context hints, they have to be enabled. This can be done using enableHints table function as follows:

EXEC enableHints()

The output is:

Hints enabled for connection ...

4.2. Disable Context Hints

Disable the hints deletes all hints set in the session. However, the execution plans that have been created using queries with the context hints activated will remain in the plan cache.

This can be done using disableHints function as follows:

EXEC disableHints()

The output is:

Hints disabled for connection ...

4.3. List Context Hints

Lists hints is useful to know the defined context hints for the present connection, and to retrieve their ID (which can be used to remove them).

This can be done using "listHints" function as follows:

EXEC listHints()

The output will be a table with a row for each defined hint and three columns: ID that identifies the hint within the connection, the hint type, and the description of the hint.

4.4. Set Context Hints

In order to set a context hint you have to invoke the appropriate function for the hint you want to define. The following subsections explain each of the available hints.

4.4.1. Force the Use of an Index

This hint forces the access to a table through a defined index. It is used as follows:

EXEC forceAccess('TABLE_NAME','INDEX_NAME')

Where TABLE_NAME is the qualified name of the table (i.e., db-APP-CUSTOMER) and INDEX_NAME is the qualified name of the index (i.e., db-APP-CUSTOMERIDX). This function does not check the existence of the index nor the table so, if you force the access through an inexistent index, the queries that use this table will fail.

4.4.2. Disable Pushdown

This hint disables the pushdown of operations to the storage engine. In general, this will negatively impact performance. This means that all the work will be done by query engine except the scans that are always performed on the storage engine side.

It is used as follows:

EXEC disablePushdown('TABLE_NAME')

Where TABLE_NAME is the qualified name of the table (i.e. db-APP-CUSTOMER).

4.4.3. Fixed Join Order

This hint sets the order of the joins as they are written in the query. It is defined as follows:

EXEC fixJoinOrder()

4.4.4. Force Join Type

This hint allows to force the kind of join which will be used between two given tables. It is set by:

EXEC forceJoinType('JOIN_TYPE', 'TABLE_1', 'TABLE_2')

Where JOIN_TYPE is one of the available types:

  • CORRELATE,

  • MERGE_JOIN,

  • NESTED_LOOP

TABLE_1 and TABLE_2 are the unqualified names of the affected tables (i.e. CUSTOMER or WAREHOUSE). This function does not check that the chosen join type is available for a given pair of tables so if not the query will fail.

4.4.5. Or To Union

This hint transforms OR conditions into UNION clauses. It accepts a Boolean parameter which indicates whether the UNION will be a UNION ALL or not.

It is activated by:

EXEC orToUnion(all)

Where all is true for a UNION ALL or false otherwise.

4.5. Remove Context Hints

A hint can be removed for the present connection. But, if the hint has been used to execute a query and an execution plan has been created and stored in the query plan cache, remove the hint won’t remove the query plan and the queries will continue to use it for their execution.

Removing a hint is performed as follows:

EXEC removeHint(HINT_ID)

Where HINT_ID (an integer, without quotes) is the identifier of the hint that wants to be removed, and can be obtained from the listHints table function.

5. Clean Query Plan Cache

This table function removes all the query plans that have been cached for a given table (not only for this connection but for every query in the server). It can be useful to ensure that a new query plan will be calculated when a hint is added or removed. However, it has an associated overhead, it will force to generate the plan for all subsequent queries that is an expensive operation.

It is invoked as follows:

EXEC cleanPlanCache('TABLE_NAME')

Where TABLE_NAME is the qualified name of the table (i.e. db-APP-CUSTOMER) from which the query plans will be removed.

6. Enable/Disable Planner Logger

These two functions let the user to enable or disable the optimizer logger. To be able to use it, you have to define a logger in your log4j configuration for the class org.apache.calcite.plan.RelOptPlanner with the level you want.

When enabling it, the log level for the RelOptPlanner logger will be set to ALL, and the planner will log all the query optimization steps in the appender configured for the RelOptPlanner logger.

Use it with caution because the amount of generated data can affect negatively the performance of the query preparation.

They can be invoked as follows:

EXEC enablePlannerDebug()
EXEC disablePlannerDebug()

System Tables

LeanXcale’s query engine provides a series of virtual tables giving access to the metadata of stored databases, also known as data dictionaries in other databases.

All the system tables are part of a virtual schema called LXSYS. Each of the following subsections is devoted to each of the tables in the system.

1. LXSYS.TABLES

It shows all the tables. Initially, only metadata tables are shown. The columns of the table are as follows:

  • tableCat: Database on which the table resides.

  • tableSchem: Schema on which the table resides.

  • tableName: Name of the table.

  • owner: User owner of the table.

  • tableType: It is the kind of table. It can have the following values:

    • SYSTEM TABLE.

    • TABLE.

    • SEQUENCE.

    • VIEW.

  • comment: a comment made on the table.

Note that indexes do not appear in this list because they are not tables by themselves.

lx% select * from lxsys.tables;

tableCat    tableSchem    tableName                   owner      tableType     comment

  db          LXSYS         AGGREGATIONS                lxadmin    SYSTEM TABLE
  db          LXSYS         ALL_CONS_COLUMNS            lxadmin    SYSTEM TABLE
  db          LXSYS         AUTO_SPLITS                 lxadmin    SYSTEM TABLE
  db          LXSYS         COLUMNS                     lxadmin    SYSTEM TABLE
  db          LXSYS         COMPONENTS                  lxadmin    SYSTEM TABLE
  db          LXSYS         CONNECTIONS                 lxadmin    SYSTEM TABLE
  db          LXSYS         DATASTORE_STATS             lxadmin    SYSTEM TABLE
  db          LXSYS         FOREIGN_KEYS                lxadmin    SYSTEM TABLE
  db          LXSYS         GROUP_PERMISSION            lxadmin    SYSTEM TABLE
  db          LXSYS         INDEX_COLUMNS               lxadmin    SYSTEM TABLE
  db          LXSYS         INDEXES                     lxadmin    SYSTEM TABLE
  db          LXSYS         ONLINE_AGGREGATE_FILTERS    lxadmin    SYSTEM TABLE
  db          LXSYS         ONLINE_AGGREGATES           lxadmin    SYSTEM TABLE
  db          LXSYS         ONLINE_AGGREGATIONS         lxadmin    SYSTEM TABLE
  db          LXSYS         PLANS_CACHE                 lxadmin    SYSTEM TABLE
  db          LXSYS         PRIMARY_KEYS                lxadmin    SYSTEM TABLE
  db          LXSYS         QUERY_CACHE                 lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_CHECKS                lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_FUNCTIONS             lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_PARTITIONS            lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_STATS                 lxadmin    SYSTEM TABLE
  db          LXSYS         TABLES                      lxadmin    SYSTEM TABLE
  db          LXSYS         TRANSACTIONS                lxadmin    SYSTEM TABLE
  db          LXSYS         TRIGGER_FUNCTIONS           lxadmin    SYSTEM TABLE
  db          LXSYS         TRIGGERS                    lxadmin    SYSTEM TABLE
  db          LXSYS         TYPE_INFO                   lxadmin    SYSTEM TABLE
  db          LXSYS         USER_GROUP                  lxadmin    SYSTEM TABLE
  db          LXSYS         VIEW_TABLES                 lxadmin    SYSTEM TABLE

2. LXSYS.COLUMNS

This system table shows all the columns of all tables in the system. It has the following columns:

  • tableCat: Database on which the table resides.

  • tableSchem: Schema on which the table resides.

  • tableName: Name of the table.

  • columnName: Name of the column.

  • dataType: Type of the column as an INTEGER. The possible values are:

    • 4: INTEGER

    • 12 VARCHAR

    • …​

  • typeName: It is the SQL type name. If NOT NULL constraint is set on the column is shown after the type name. E.g.: INTEGER NOT NULL.

  • columnSize: For VARCHAR columns it provides the max length. For DECIMAL columns it provides the length of the non decimal part. For TIMESTAMP columns it is 6. For other column types is -1.

  • bufferLength. Deprecated. Do not use it.

  • decimalDigits. For DECIMAL columns it provides the number of fractional digits.

  • numPrecRadix. Indicates the base of the numeric columns. It is always 10.

  • nullable. Deprecated. 1 if it can be NULL, 0 if it has NON NULL constraint.

  • remarks. Comment on the column.

  • columnDef. Deprecated. Do not use it.

  • sqlDataType. Deprecated. Do not use it.

  • sqlDatetimeSub. Deprecated. Do not use it.

  • charOctetLength. For VARCHAR columns it shows the max length in bytes.

  • ordinalPosition. The position of the column in the create statement. The first column has position 1.

  • isNullable. YES if it can be NULL, NO if it has NON NULL constraint.

  • scopeCatalog. Deprecated. Do not use it.

  • scopeSchema. Deprecated. Do not use it.

  • scopeTable. Deprecated. Do not use it.

  • sourceDataType Deprecated. Do not use it.

  • isAutoincrement. For non-system tables it is YES if the column is auto-incremental, NO, otherwise. For system tables is blank.

  • isGeneratedcolumn. For non-system tables it is YES if the column is auto-generated, NO, otherwise. For system tables is blank.

Here there is a fragment of the table contents after creating a user table T with two columns PK and C:

lx% select * from lxsys.columns;

  tableCat    tableSchem    tableName                   columnName             dataType    typeName                               columnSize    bufferLength    decimalDigits    numPrecRadix    nullable    remarks    columnDef    sqlDataType    sqlDatetimeSub    charOctetLength    ordinalPosition    isNullable    scopeCatalog    scopeSchema    scopeTable    sourceDataType    isAutoincrement    isGeneratedcolumn

  db          APP           T                           PK                     4           INTEGER                                -1                                             10              1                                                                    -1                 1                  YES                                                                          NO                 NO
  db          APP           T                           C                      12          VARCHAR(4096) CHARACTER SET "UTF-8"    4096                                           10              1                                                                    4096               2                  YES                                                                          NO                 NO
  db          LXSYS         AGGREGATIONS                name                   12          VARCHAR                                -1                                             10              1                                                                    -1                 1                  YES
...

3. LXSYS.PRIMARY_KEYS

It shows the primary keys for user created tables.

The column names of this system table are:

  • columnName: Name of the column.

  • keySeq. 1 if it is a sequence, 0 otherwise.

  • pkName. Same as columnName.

  • tableCat: Database on which the table resides.

  • tableSchem: Schema on which the table resides.

  • tableName: Name of the table.

This is the contents after creating a table t with a column pk as primary key:

lx% select * from lxsys.primary_keys;

  columnName    keySeq    pkName    tableCat    tableSchem    tableName

  PK            0         PK        db          APP           T

4. LXSYS.FOREIGN_KEYS

It contains all the declared foreign keys in the system for user created tables. The column names of this system table are:

  • pktableCat: Database of the table with the referenced primary key.

  • pktableSchem: Schema of the table with the referenced primary key.

  • pktableName: Table with the referenced primary key.

  • pkcolumnName: Column name of the referenced primary key.

  • fktableCat: Database of the table with the foreign key.

  • fktableSchem: Schema of the table with the foreign key.

  • fktableName: Table with the foreign key.

  • fkcolumnName: Column with the foreign key.

  • keySeq: 1 if is a sequence, 0 otherwise.

  • updateRule: Rule for updates on the primary key.

  • deleteRule: Rule for deletes on the primary key.

  • fkName: Name of the foreign key.

  • pkName: Name of the primary key.

  • deferrability: 1 is the foreign key check is delayed to the commit of the transaction, 0 if it is checked immediately after changing the primary key. Currently, not supported. An example of this table after creating a table with a foreign key to another table:

lx% select * from lxsys.foreign_keys;

  pktableCat    pktableSchem    pktableName    pkcolumnName    fktableCat    fktableSchem    fktableName    fkcolumnName    keySeq    updateRule    deleteRule    fkName                   pkName    deferrability

  db            APP             T1             PK              db            APP             T2             FK_TO_T1        0         3             3             FK_db-APP-T2_FK_TO_T1    [PK]      7

5. LXSYS.INDEXES

It shows all the indexes created for user created tables in the system.

The columns are:

  • tableCat: Database on which the table resides.

  • tableSchem: Schema on which the table resides.

  • tableName: Name of the table.

  • non_unique: Non unique constraint.

  • index_qualifier: ??

  • index_name: Name of the index.

  • type: ??

  • ordinal_position: ??

  • column_name: Name of the indexed column.

  • asc_or_desc: 1 if ascending ordered, and 0 if descending ordered.

  • cardinality: Cardinality of the index.

  • pages: ??

  • filter_condition: ??

An example of the table contents with an index created for a table is:

> select * from LXSYS.indexes;

6. LXSYS.TABLE_CHECKS

It shows all the table integrity constraints defined in user created tables (NOT NULL, …​). The columns are:

  • name: Constraint name.

  • expression: Constraint description.

  • tableCat: Database on which the table resides.

  • tableSchem: Schema on which the table resides.

  • tableName: Name of the table.

7. LXSYS.TRANSACTIONS

It shows all active transactions in the Query Engine relating the information to the connection. The following fields are showed:

  • txn. The transaction id.

  • state: State of the transaction. // ?? possible values??

  • conn: Connection id.

  • session: Session id.

  • mode: Isolation level of the session.

  • uid: user that created the session.

  • startTs: Start timestamp of the transaction.

  • time: Epoch with the timestamp at which the transaction was started.

  • commitTs: If the transaction already committed its commit timestamp, -1 otherwise.

lx% select * from LXSYS.TRANSACTIONS;

  txn           state     conn                                    session    mode                  uid        startTs       time               commitTs

  1031227002    ACTIVE    681efbc4-6f56-41ac-b11f-3dc32bbac7f9    6          SNAPSHOT_ISOLATION    lxadmin    1031226001    435391393809375    -1

8. LXSYS.CONNECTIONS

It contains all the currently open connections with the database.

The columns are:

  • connectionId: The identifier of the connection.

  • isCurrent: true if it is the one that you are using, false otherwise.

  • uid: The user who created the connection.

  • isolationLevel: Isolation level of the connection.

  • created: When the connection was created.

  • used: Last time the connection was used.

  • hostIP: the IP from which the client is connected.

  • status: A value indicating the status that can be:

    • ACTIVE when the connection is running an action.

    • IDLE when the connection is not doing anything.

    • FETCHING when the server is sending the resultset to the client side and the client side is iterating the resultset.

  • lastAction: Last action executed.

  • lastSql: Last SQL statement executed.

  • database: Database to which the connection was established.

  • schema: Schema to which the connection was established.

This is an example of the table contents with just the connection being used by the SQL console:

lx% select * from LXSYS.CONNECTIONS;

connectionId                            isCurrent    uid        isolationLevel        created                    used                       hostIP    status    lastAction        lastSql                            database    schema

f262cbab-f6db-4362-aac4-7727a32220ee    true         lxadmin    SNAPSHOT_ISOLATION    2024-02-08 13:45:04.662    2024-02-08 19:18:24.019              ACTIVE    EXECUTE_DIRECT    select * from LXSYS.CONNECTIONS    db          APP

9. LXSYS.ONLINE_AGGREGATIONS

It contains the user defined online aggregations.

Its columns are:

  • onlineRelationTable: Name of the online aggregation table.

  • ordinalPosition: ??

  • name: Name of the online aggregate column.

  • aggType: Kind of aggregation that can be:

    • COUNT.

    • SUM.

    • MAX.

    • MIN.

  • aggExpression: Expression over which the online aggregate is applied.

  • tableCat: Database where is defined.

  • tableSchem: Schema where is defined.

  • tableName: Name of the parent table over which the online aggregation has been defined.

An example follows:

lx% select * from LXSYS.ONLINE_AGGREGATIONS where onlineRelationTable = 'AGG_TRIPS' order by ordinalPosition;

 onlineRelationTable  ordinalPosition  name        aggType  aggExpression          tableCat  tableSchem  tableName
 AGG_TRIPS            1                DURATION    SUM      CAST2INT[$DURATION]    demo      APP         TRIPS
 AGG_TRIPS            2                MAX_START   MAX      $START_TS              demo      APP         TRIPS
 AGG_TRIPS            3                MIN_START   MIN      $START_TS              demo      APP         TRIPS
 AGG_TRIPS            4                PASSENGERS  SUM      CAST2INT[$PASSENGERS]  demo      APP         TRIPS
 AGG_TRIPS            5                COUNT_ALL   COUNT                           demo      APP         TRIPS

10. LXSYS.ONLINE_AGGREGATES

It shows information about the group by expressions for existing online aggregation relations

lx% select * from LXSYS.ONLINE_AGGREGATES where onlineRelationTable = 'AGG_TRIPS' order by ordinalPosition;

 onlineRelationTable  ordinalPosition  name       groupBy                                                                    tableCat  tableSchem  tableName
 AGG_TRIPS            1                END_YEAR   TS_EXTRACT['YEAR', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]   demo      APP         TRIPS
 AGG_TRIPS            2                END_MONTH  TS_EXTRACT['MONTH', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]  demo      APP         TRIPS
 AGG_TRIPS            3                END_DAY    TS_EXTRACT['DAY', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]    demo      APP         TRIPS
 AGG_TRIPS            4                END_HOUR   TS_EXTRACT['HOUR', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]   demo      APP         TRIPS

11. LXSYS.ONLINE_AGGREGATE_FILTERS

It shows information about the filter expressions for existing online aggregation relations

lx% select * from LXSYS.ONLINE_AGGREGATE_FILTERS where onlineRelationTable = 'AGG_TRIPS';

 onlineRelationTable  filter                                  tableCat  tableSchem  tableName
 AGG_TRIPS            (TS_EXTRACT['YEAR', $START_TS] = 2021)  demo      APP         TRIPS

12. LXSYS.AGGREGATIONS

It shows information about the aggregation columns for user defined tables.

lx% select * from LXSYS.AGGREGATIONS;

name         aggType  tableCat  tableSchem  tableName
 PASSENGERS  SUM      demo      APP         MANUAL_AGG_TRIPS
 DURATION    SUM      demo      APP         MANUAL_AGG_TRIPS
 MIN_START   MIN      demo      APP         MANUAL_AGG_TRIPS
 COUNT_ALL   SUM      demo      APP         MANUAL_AGG_TRIPS
 MAX_START   MAX      demo      APP         MANUAL_AGG_TRIPS

13. LXSYS.TABLE_PARTITIONS

It shows information about the table partitions and the split points for each partition.

The columns are:

  • catalog: Database name.

  • schema: Schema name.

  • tableName: Table name.

  • regionId: Id of the partition/region/fragment.

  • address: Id of the storage server holding the partition.

  • minValue: Min value contained by the partition. MIN means the lowest possible value.

  • maxValue: Max value contained by the partition. MAX means the highest possible value.

This is an example with a table with a single partition containing all data.

lx% select * from LXSYS.TABLE_PARTITIONS order by tableName;

  catalog    schema    tableName       regionId      address    minValue    maxValue

  db         APP       T            1649267441666    ds100      MIN         MAX

14. LXSYS.TABLE_STATS

It contains the collected statistics and their values.

Its columns are:

  • catalog: Database name.

  • schema: Schema name.

  • tableName: Table name.

  • regionId: Id of the partition/region/fragment.

  • address: Id of the storage server holding the partition.

  • stat: Name of the stat.

  • value: Value of the stat.

This is an example of the stats of an empty table:

lx% select * from LXSYS.TABLE_STATS;

  catalog    schema    tableName    regionId         address    stat        value

  db         APP       T            1649267441666    ds100      nndwr       1
  db         APP       T            1649267441666    ds100      nmbs        3
  db         APP       T            1649267441666    ds100      rtime       1707415699
  db         APP       T            1649267441666    ds100      depth       1
  db         APP       T            1649267441666    ds100      nblkwr      4
  db         APP       T            1649267441666    ds100      noldused    1
  db         APP       T            1649267441666    ds100      nsplit      1
  db         APP       T            1649267441666    ds100      noldblks    1
  db         APP       T            1649267441666    ds100      mtime       1707415699
  db         APP       T            1649267441666    ds100      stime       1707415699
  db         APP       T            1649267441666    ds100      wtime       1707415699
  db         APP       T            1649267441666    ds100      nnd         1
  db         APP       T            1649267441666    ds100      nused       1
  db         APP       T            1649267441666    ds100      ctime       1707415699
  db         APP       T            1649267441666    ds100      ncache      2
  db         APP       T            1649267441666    ds100      nblks       3

15. LXSYS.DATASTORE_STATS

It shows information about table statistics kept by the storage servers.

Its columns are:

  • address: Id of the storage server holding the partition.

  • stat: Name of the stat.

  • value: Value of the stat.

This shows the number of blocks held in each storage server:

lx% select * from LXSYS.DATASTORE_STATS where stat like '%nblks%';

  address    stat          value

  ds103      mblk/nblks    8192
  ds102      mblk/nblks    8192
  ds101      mblk/nblks    8192
  ds100      mblk/nblks    8192

16. LXSYS.USER_GROUP

Deprecated. Not to be used.

17. LXSYS.GROUP_PERMISSION

It contains the permissions that have been granted. Its columns are:

  • groupId

  • path

  • catalog

  • schema

  • entityName

  • perms

lx% select * from LXSYS.GROUP_PERMISSION where groupId = 'APP';
 groupId  path  catalog  schema  entityName  isRead  isWrite  isAlter
 APP      /seq  demo     APP                 true    true     true
 APP      /tbl  demo     APP                 true    true     true

18. LXSYS.PLANS_CACHE

It shows execution plans that are currently cached.

Its columns are:

  • initialPlan: Initial query plan after compilation.

  • executionPlan: Final query plan produced by the optimizer.

  • tableCat: Database name.

  • tableSchem: Schema name.

  • tableName: Table name.

An example with a fragment of the contents is:

lx% select * from LXSYS.PLANS_CACHE;

initialPlan                                            executionPlan                                                        tableCat    tableSchem    tableName

LogicalProject(inputs=[0..4])
  LogicalTableScan(table=[[db, LXSYS, PLANS_CACHE]])   EnumerableTableScan.ENUMERABLE.[](table=[db, LXSYS, PLANS_CACHE])    db          LXSYS         PLANS_CACHE
...

19. LXSYS.TRIGGER_FUNCTIONS

It contains all predefined and user defined trigger functions.

Its columns are:

  • triggerName: Name of the trigger function.

  • description: Short description of the purposed of the trigger.

An example is:

%lx select * from LXSYS.TRIGGER_FUNCTIONS;

  triggerName                  description

  HashFieldTrigger             Hash Field Trigger
  UUIDTrigger                  UUID Trigger
  AutoGeohashTrigger           Auto Geohash Trigger
  AutoIncrementTrigger         Auto Increment Trigger
  DefaultValueTrigger          Default Value Trigger
  Trigram_Trigger              Trigger to maintain trigram index
  ForceAutoIncrementTrigger    Force Auto Increment Trigger
  DeleteForeignKeyTrigger      Check for Referential Integrity when deleting row from referenced table
  ForeignKeyTrigger            Check for Referential Integrity

20. LXSYS.TRIGGERS

It contains all predefined and user defined trigger functions.

Its columns are:

  • triggerName: Name of the trigger.

  • triggerFunction: Name of the trigger function.

  • args: parameters of the trigger.

  • isBefore: true if executed before the triggering action, false otherwise.

  • isInsert: true if the triggering action is an insert, false otherwise.

  • isUpdate: true if the triggering action is an update, false otherwise.

  • isDelete: true if the triggering action is a delete, false otherwise.

  • priority: Specifies the precedence of a trigger compared to other triggers that are defined on the same event. Triggers with lower priority values are executed before triggers with higher priority values.

  • isEnable: true if it is currently enabled, false if disabled.

  • tableCat: Database name.

  • tableSchem: Schema name.

  • tableName: Table name.

%lx select * from LXSYS.TRIGGERS;
triggerName              triggerFunction       args                      isBefore  isInsert  isUpdate  isDelete  priority  tableCat            tableSchem  tableName

 I_autoIncrementTrigger  autoIncrementTrigger  I,NOUSR.TAUTO_IDENTITY_I  true      true      false     false     100       contrib_regression  NOUSR       TAUTO
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       TESTTAB_FK
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       BYTEATAB
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       BOOLTAB
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       BYTEATAB2

21. LXSYS.TABLE_FUNCTIONS

It shows available table functions.

Its columns are:

  • functionName: Name of the table function.

  • parameters: List of pararmeters.

  • description: Description of how to use it.

An example with a short fragment of the contents is:

lx% Select * from LXSYS.TABLE_FUNCTIONS;

functionName            parameters           description
oaValidator             (String, String)     oaValidator receives Online Aggregator Name ...
...

22. LXSYS.VIEW_TABLES

It shows user defined views.

Its columns are:

  • query: The query associated to the view.

  • holder: //?? what is this

  • tableCat: Database name.

  • tableSchem: Schema name.

  • tableName: Table name.

And example is:

lx% Select * from LXSYS.VIEW_TABLES;

query                             holder        tableCat  tableSchem  tableName
SELECT * FROM "T1" WHERE "I" = 8  adhoc-APP-T1  adhoc     APP         MYVIEW

Table Functions

A table function refers to a user-defined function that returns a table or a set of rows as its result. Table functions are used to encapsulate logic and calculations, allowing you to treat the result of the function as if it were a table in your SQL queries.

1. How to Develop and Deploy a Table Function

First you need to add the LeanXcale maven repository and the qe-tableFunctions dependency to your pom.xml file:

<repositories>
    <repository>
        <id>maven-releases</id>
        <url>https://nexus.leanxcale.com/repository/maven-releases</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>com.leanxcale</groupId>
        <artifactId>qe-tableFunctions</artifactId>
        <version>1.7.6</version>
    </dependency>
</dependencies>

In order to create a custom table function, a Java class has to be written implementing the AutoloadTableFunction interface and defining the following methods:

Function Description Return

getFunctionName()

Define the custom table function name

String

getDescription()

Provides the custom table function description

String

eval(<table function arguments>)

Generate the collection of rows to be queried

ScannableTable

You can define as many eval functions as you need as long as their signatures are different. In order to write the eval functions returning ScannableTable, the following methods should be implemented:

  • getRowType

    • RelDataType getRowType(RelDataTypeFactory typeFactory)

    • Description:

      • Returns this table’s row type. This is a struct type whose fields describe the names and types of the columns in this table. The implementer must use the type factory provided. This ensures that the type is converted into a canonical form; other equal types in the same query will use the same object.

    • Parameters:

      • typeFactory: Type factory with which to create the type.

    • Returns:

      • Row type

  • getStatistic

    • Statistic getStatistic()

    • Description:

      • Statistic getStatistic()

    • Returns:

      • Statistic

  • getJdbcTableType

    • Description:

      • Returns the type of table.

    • Returns:

      • Table type.

  • sRolledUp

    • Description:

      • Determines whether the given column has been rolled up.

    • Returns:

      • true if the column is rolled up.

  • rolledUpColumnValidInsideAgg

    • Description:

      • Determines whether the given rolled up column can be used inside the given aggregate function. You can assume that isRolledUp(column) is true.

    • Parameters:

      • column - The column name for which isRolledUp is true.

      • call - The aggregate call.

      • parent - Parent node of call in the SqlNode tree.

      • config - Config settings. May be null.

    • Returns:

      • true if the given aggregate call is valid.

All the table function classes must belong to the package com.leanxcale in order to have them auto-loaded when LeanXcale starts.

Find below a complete example:

package com.leanxcale;

import com.leanxcale.QE.tableFunction.AutoloadTableFunction;
import org.apache.calcite.DataContext;
import org.apache.calcite.config.CalciteConnectionConfig;
import org.apache.calcite.linq4j.Enumerable;
import org.apache.calcite.linq4j.Linq4j;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.schema.ScannableTable;
import org.apache.calcite.schema.Schema;
import org.apache.calcite.schema.Statistic;
import org.apache.calcite.schema.Statistics;
import org.apache.calcite.sql.SqlCall;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.type.SqlTypeName;
import org.apache.calcite.util.ImmutableBitSet;
import java.util.Collections;

public class SeriesGeneratorTableFunction implements AutoloadTableFunction {

  public static String getFunctionName() {
    return "series_generator";
  }

  public ScannableTable eval(Integer start, Integer stop, Integer step) {
    int actualStep = (null == step) ? 1 : step;
    int numResults = (1 + (stop - start)/actualStep);
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        Object[][] results = new Object[numResults][1];
        for (int inx = 0; inx < numResults; inx++) {
          results[inx][0] = start + inx*actualStep;
        }
        return Linq4j.asEnumerable(results);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        return typeFactory.builder()
                .add("SERIES", SqlTypeName.INTEGER)
                .build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(numResults, Collections.singletonList(ImmutableBitSet.of(numResults)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

Once you do mvn package, a .jar file gets created with the custom table functions you defined. This .jar should be included in the Query Engine’s classpath (lxs/LX-BIN/lib). The Query Engine should be restarted after adding the table function .jar file. For the cloud version, the leanxcale console will provide the means to add the table function code to your database.

2. Using Table Functions

You can use a table function in the FROM clause of a query as the first argument to the function table. In this example, series_generator is the table function:

SELECT SERIES FROM table(series_generator(1, 10, 1)) WHERE SERIES > 3;

+--------+
| SERIES |
+--------+
| 4      |
| 5      |
| 6      |
| 7      |
| 8      |
| 9      |
| 10     |
+--------+
7 rows selected (0.008 seconds)

Take into account that if the parameter you’re passing to your table function is a string, you’ll need to use single quotes. Double quotes are not allowed. This is an example:

SELECT * FROM table(my_new_tablefunction('test_string'));

But this does not work:

SELECT * FROM table(my_new_tablefunction("test_string"));

3. Table Function Utils

In order to ease the writing of table function, you can use the TableFunctionUtils class. You will be able to execute SQL queries within your Table Function.

ResultType Function Description

TableFunctionUtils

utils()

Static function to build an instance for the table function utils according to the current connection

Connection

getConnection()

Provide an connection to perfom SQL actions. This connection cannot be committed or rollbacked

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID ////

Session

session(Settings Settings)

Creates a new session to the KiVi database. It shares the current KiVi connection.

Session

session()

Creates a new session to the KiVi database. It shares the current KiVi connection and transaction to connect through the direct API. ////

Table

getTable(String tableName)

Provide a table object that exist within current transaction context so you can perform basic operations on it

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

Find below an example:

public class TableFunctionUtilsExample implements AutoloadTableFunction {
  public static final String functionName = "TableFunctionUtilsExample";

  public static String getFunctionName() {
    return functionName;
  }

  public static String getDescription() {
    return functionName;
  }

  public ScannableTable eval(Object parentTableName, Object sonTableName) {
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        List<Object[]> result = new ArrayList<>();

        TableFunctionUtils utils = TableFunctionUtils.utils();
        Settings settings = new Settings();
        settings.setDatabase(utils.getDbName());
        settings.setSchema(utils.getSchemaName());
        settings.disableConflictChecking();
        settings.disableLogging();
        try (Session session = utils.session(settings)) {
          Table table = session.database().getTable((String) parentTableName);
          try (ResultSet rs = utils.getConnection().createStatement().executeQuery("select i_name, max_price from " + sonTableName)) {
            while (rs.next()) {
              Tuple tuple = table.find()
                  .filter(Filters.eq("i_name", rs.getString(1)).and(Filters.eq("i_price", rs.getInt(2))))
                  .iterator().next();
              Object[] row = new Object[3];
              row[0] = tuple.get("i_id");
              row[1] = tuple.get("i_name");
              row[2] = tuple.get("i_price");
              result.add(row);
            }
          }
        } catch (Exception exception) {
          throw new LeanxcaleRuntimeException(exception);
        }

        return Linq4j.asEnumerable(result);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        RelDataTypeFactory.Builder builder = typeFactory.builder();
        builder.add("ID", SqlTypeName.INTEGER);
        builder.add("NAME", SqlTypeName.VARCHAR);
        builder.add("PRICE", SqlTypeName.INTEGER);
        return builder.build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(1, Collections.singletonList(ImmutableBitSet.of(1)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

DML Triggers

A trigger is procedural code that is automatically executed in response to certain events on a particular table. DML triggers run when a user tries to modify data through a data manipulation language (DML) event. They can be used to enforce business rules and data integrity, query other tables, and include complex statements. DML events are INSERT, UPDATE, or DELETE statements on a table.

CREATE TRIGGER myTrigger BEFORE INSERT UPDATE ON t
FOR EACH ROW EXECUTE genericBackupTrigger('t_backup,ts')
PRIORITY 10;

DISABLE TRIGGER myTrigger ON t;

ENABLE TRIGGER myTrigger ON t;

DROP TRIGGER myTrigger ON TABLE t;

1. CREATE TRIGGER Statement

The syntax to create a trigger is as follow:

createTrigger:
      CREATE TRIGGER [ IF NOT EXISTS ] triggerName
      (BEFORE | AFTER)
      [INSERT] [UPDATE] [DELETE]
      ON tableName FOR EACH ROW EXECUTE
      triggerFunction ['(' arg ')']
      [PRIORITY priority]

where

  • triggerName is the name used by the user to identify the trigger.

  • tableName is the table where the trigger applies

  • triggerFunction is the Trigger Function name as the result of getTriggerName(). See Trigger Function interface

  • arg is the trigger function argument as string literal passed in triggerExecutor(). Check [triggerExecutor params]

  • priority is an integer literal used to order teh trigger execution in case that the table have more than one. Note that the resulting new values from one trigger will be passed to the execution of the next one.

Note that you can define 1 to 3 dml events among INSERT, UPDATE and DELETE.

2. Trigger Functions System Table

To find out the loaded trigger functions, you can query TRIGGER_FUNCTIONS system table.

> select * from LXSYSMETA.TRIGGER_FUNCTIONS;
+---------------------------+-------------------------------------------------------------------------+
|        triggerName        |                               description                               |
+---------------------------+-------------------------------------------------------------------------+
| HashFieldTrigger          | Hash Field Trigger                                                      |
| UUIDTrigger               | UUID Trigger                                                            |
| AutoGeohashTrigger        | Auto Geohash Trigger                                                    |
| ForceAutoIncrementTrigger | Force Auto Increment Trigger                                            |
| AutoIncrementTrigger      | Auto Increment Trigger                                                  |
| DefaultValueTrigger       | Default Value Trigger                                                   |
| DeleteForeignKeyTrigger   | Check for Referential Integrity when deleting row from referenced table |
| ForeignKeyTrigger         | Check for Referential Integrity                                         |
+---------------------------+-------------------------------------------------------------------------+

3. Custom Trigger Functions

Trigger functions are java objects that are automatically loaded into the Query Engine and provide a mechanism to build the Trigger’s executors, that holds the actual trigger procedural code. In order to create a custom trigger function, you need to implement from TriggerFunctionI Interface and define the following methods:

ResultType Function Description

String

getTriggerName()

Define the trigger function name

String

description()

Define a description for the trigger function so users can figure out what it does

Object

buildArguments(String arg)

Build an instance of argumntes from freestyle string. The instance is passed to triggerExecutor method.

TriggerExecutor

triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object args)

Creates an istance of TriggerExecutor. See Trigger Executor Method

public class GenericBackupTriggerExample implements TriggerFunctionI {
  static String triggerName = "genericBackupTrigger";

  @Override
  public String getTriggerName() {
    return triggerName;
  }

  @Override
  public Object buildArguments(String arg) {
    return arg.split(",");
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    String[] args = (String[]) arg;
    return new GenericBackupTriggerExecutor(triggerUtils, newValues, oldValues, args[0], args[1]);
  }

  @Override
  public String description() {
    return "Generic backup Trigger example";
  }
}

3.1. Method triggerExecutor

The method triggerExecutor builds an instance of TriggerExecutor accordingly to the given params.

Param position Param type Description

1

TriggerUtils

Utils. See Trigger Utils

2

TypedNamedValuesI

New values for table’s row. Null on deletion

3

TypedNamedValuesI

Old values for table’s row. Null on insertion

4

String

Trigger argument

3.2. Custom triggerExecutor

Trigger’s executors are the ones that actually holds the procedural code to be executed when a trigger is fired. You would just need to extend from TriggerExecutor abstract class and define the following methods:

ResultType Function Description

void

execute()

Executes the procedural code

In order to implement the trigger execution code, you might probably need to use the following final methods:

ResultType Function Description

boolean

hasValues()

Tell if the new values are available

boolean

hasOldValues()

Tell if the old values are available

ExpressionType

getType(String column)

Get the column’s type

Object

getValue(String column)

Get new value for column

Object

getOldValue(String column)

Get old value for column

void

setValue(String column, Object value)

Set new value for column

String[]

valuesFieldNames()

Get values field names

Note that TriggerExecutor abstract class constructor has the following parameters:

Param position Param type Description

1

TypedNamedValuesI

New values for table’s row

2

TypedNamedValuesI

Old values for table’s row

  static public class GenericBackupTriggerExecutor extends TriggerExecutor {

    public final String backupTableName;
    public final String tsColumnName;
    public final TriggerUtils triggerUtils;

    public GenericBackupTriggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, String backupTableName, String tsColumnName) {
      super(newValues, oldValues);
      this.backupTableName = backupTableName;
      this.tsColumnName = tsColumnName;
      this.triggerUtils = triggerUtils;
    }

    @Override
    public String getExecutorName() {
      return triggerName;
    }

    @Override
    public void execute() {
      if (!hasValues()) { // delete
        return;
      }
      try (Session session = triggerUtils.session()) {
        Table table = session.database().getTable(backupTableName);
        Tuple tuple = table.createTuple();
        for (String column : valuesFieldNames()) {
          tuple.put(column, getValue(column));
        }
        tuple.put(tsColumnName, Timestamp.valueOf(LocalDateTime.now()));
        table.insert(tuple);
        // no need to commit when sharing QE's transaction
      } catch (Exception exception) {
        throw new LeanxcaleRuntimeException(exception);
      }
    }
  }

4. Trigger Utils

An instance of trigger utils provides connection information and some other functionalities that could be useful in order to implemet a trigger like creating a no-SQL direct session, get the next or current value of a sequence, generate a uuid, etc. Find below the available methods:

ResultType Function Description

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID

Connection

getConnection()

Get current JDBC connection

Session

session(Settings Settings)

Creates a new session to a Kivi database. It shares the current kivi connection.

Session

session()

Creates a new session to a Kivi database. It shares the current kivi connection and transaction to connect through the direct API.

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

As you might realize, TableFunctionUtils is an extended implementation of TriggerUtils, see Table Function Utils. You could use TableFunctionUtils within trigger’s code but we strongly recommend to use the given TriggerUtils instance so the trigger can be fired by no-SQL API events too.

5. Expose a Trigger as a Table Function

For developments environments, it could be interesting to execute a table funtion in order to test a trigger behavior. In order to do this, we provide an abstract class TriggerTableFunction that will save you from do it yourself. As you can see in the following example, you would just need to add the definitions of the table’s function methods getFunctionName() and eval(..).

public class BackupTriggerExample extends TriggerTableFunction {

  static String triggerName = "backupTrigger";
  static String[] columnNames = {"PK1", "PK2", "F3", "F4"};

  public static String getFunctionName() {
    return triggerName;
  }

  @Override
  public String getTriggerName() {
    return getFunctionName();
  }

  @Override
  public Object buildArguments(String s) {
    return s;
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    return new BackupTriggerExecutor(newValues, oldValues);
  }

  @Override
  public String description() {
    return "Trigger example";
  }

  public ScannableTable eval(Long pk1, String pk2, String f3, Integer f4) {
    Object[] values = {pk1, pk2, f3, f4};
    int[] types = {Types.BIGINT, Types.VARCHAR, Types.VARCHAR, Types.INTEGER};
    NamedValuesI newValues = new TypedNamedValuesImpl(columnNames, values, types);
    return evalTrigger(newValues, null, buildArguments(null));
  }
}

Don’t forget to return the evalTrigger(NamedValuesI newValues, NamedValuesI oldValues, String args) result so the trigger is executed and its result is returned by the table function. Note that evalTrigger params will be mapped to the triggerExecutor ones, and it will invoke buildArguemnts.

 try (ResultSet rs = s.executeQuery("exec backupTrigger(1, 'a', 'af3', null)")) {
        assertThat(rs.next(), is(true));
        assertThat(rs.getLong(1), is(1L));
        assertThat(rs.getString(2), is("a"));
        assertThat(rs.getString(3), is("af3"));
        assertThat(rs.getInt(4), is(100));
        assertThat(rs.next(), is(false));
      }

5.1. Typed Named Values

The TypedNamedValuesI interface provides a interface to access a value by name. This is the following:

ResultType

Function

Description

ExpressionType

getType(String name)

Get the column’s type

Object

getValue(String name)

Get the value for name. Null when not found or value is null

void

setValue(String name, Object value)

Set the value for name

String[]

getFieldNames()

Get field names

Object[]

getValues()

Get values ordered as getFieldNames()

boolean

contains(String name)

Tells if name exists

int

size()

Tells size of the values

String

sqlTableName()

Table’s name the values belong to. Null when unknown

Anyway, we provide a basic TypedNamedValuesI implementation that might satisfy your needs.

  /**
   * Create a TypedNamesValues
   * @param positionsMap Map value position by name
   * @param row values
   * @param typeMap Map SQL type by name
   */
  public TypedNamedValuesImpl(Map<String, Integer> positionsMap, Object[] row, Map<String, ExpressionType> typeMap) {
    ...
  }

  /**
   * Create a NamedValues object
   * @param names values names
   * @param row values ordered by names
   * @param types  SQL types ordered by names
   */
  public TypedNamedValuesImpl(String[] names, Object[] row, int[] types) {
    ...
  }

5.2. Expression Type

The ExpressionType interface provides information about a table’s column or an expression. This is the following:

ResultType

Function

Description

int

sqlType()

Expression’s SQL JDBC type

int

precision()

Expression’s SQL JDBC type precision

int

scale()

Expression’s SQL JDBC type scale

boolean

isNullable()

Tells if the expression is nullable

6. System Triggers

Despite of custom triggers, we provide some system triggers that could be used directly by the user.

6.1. Auto Increment Trigger

This trigger is created automatically for a table when using the IDENTITY clause on table’s creation or alteration. Its execution fills the desired column with sequence’s next val when it is null. Its argument is a list of strings separated by coma. Find below their meaning:

Argument position

Description

1

Name of the column to be assigned with sequence’s next val when null

2

Sequence name. We suggest to include the schema name so the trigger won’t fail when connected to a different schema

The following table’s creation is equivalent to the one below:

create table tauto (i int AS IDENTITY start with 10 increment by 10, v VARCHAR, PRIMARY KEY (i));
create table tauto (i int, v VARCHAR, PRIMARY KEY (i));

create sequence tauto_i_seq start with 10 increment by 10;

CREATE TRIGGER tauto_i_seq_trigger BEFORE INSERT ON tauto
FOR EACH ROW EXECUTE autoIncrementTrigger('i,tauto_i_seq')
PRIORITY 100;

-- Note that the autoincrement colums will become an implicit column for inserts
insert into tauto values ('value');

6.2. Geohash Trigger

This trigger is created automatically for a table when using the PRIMARY GEOHASH KEY or GEOHASH clauses on table’s creation. Hidden geohash fields are created too when using these clauses. The trigger execution fills the desired column with the calculated geohash when it is null. Its argument is a list of strings separated by coma. Find below their meaning:

Argument position

Description

1

Name of the column to be assigned with the calculated geohash

2

WKT field if there is only two arguments, or Latitude field if there is three

3

Longitude field if there is three arguemnts

The following table’s creation is equivalent to the one below:

create table t (wkt VARCHAR, latitude double, longitude double, PRIMARY GEOHASH KEY (wkt), GEOHASH (latitude,longitude));
create table t (wkt VARCHAR, latitude double, longitude double, gh_wkt VARCHAR PRIMARY KEY, gh_lat_long VARCHAR);

CREATE TRIGGER gh_wkt_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_wkt,wkt')
PRIORITY 100;

CREATE TRIGGER gh_lat_long_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_lat_long,latitude,longitude')
PRIORITY 100;

-- Note that GEOHASH clause will create and index too, so we create it here manually
CREATE INDEX gh_lat_long_index ON t (gh_lat_long);

-- Note that the geohash column will become an implicit column for inserts and selects
insert into t values (ST_asText(ST_MakePoint(3.41,40.25)),cast(40.25 as double), cast(3.41 as double));

Geohash fields are fields indexed geographically for GIS applications

6.3. UUID Trigger

This trigger is created automatically for a table when not defining a PRIMARY KEY. A hidden UUID column is created too. The trigger execution fills the desired column with an UUID valuen when it is null. Its argument is Name of the column to be assigned with uuid.

The following table’s creation is equivalent to the one below:

create table tauto (v VARCHAR);
create table t (id VARCHAR, v VARCHAR, PRIMARY KEY (id));

CREATE TRIGGER uuid_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE UUIDTrigger('id')
PRIORITY 100;

-- Note that the uuid column will become a implicit column for inserts and selects
insert into t values ('value');

6.4. Referential Integrity Triggers

Referential integrity is implemented as well using triggers. Every time a Foreign key constraint is added to a table, two triggers are added to the table and one to the referenced table. They are disabled by default. You should enable them to activate referential checks.

> create table tpk (i int, v VARCHAR, i2 int, PRIMARY KEY (i));

> create table tfk (pk int, v VARCHAR, fk int, PRIMARY KEY (pk), CONSTRAINT fKi2 FOREIGN KEY (fk) REFERENCES tpk(i));

> ENABLE TRIGGER fKi2 ON tfk;

> ENABLE TRIGGER fKi2 ON tpk;

> select * from LXSYSMETA.TRIGGERS;
triggerName      FKI2
triggerFunction  DeleteForeignKeyTrigger
args             {"fkTableName":"APP.TFK","columns":["I"],"fkColumns":["FK"]}
isBefore         true
isInsert         false
isUpdate         false
isDelete         true
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TPK

triggerName      FKI2_NO_CONFLICT
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":false,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         false
tableCat         db
tableSchem       APP
tableName        TFK

triggerName      FKI2
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":true,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TFK

The two triggers added to the table use the same trigger function ForeignKeyTrigger but with different params. One will only check that the referenced key exist and the other will create a conflict on it too. Both will be triggered on insertion. You would only need enable the one you need.

The trigger on the referenced table DeleteForeignKeyTrigger would prevent from deleting a row whose primary key is referenced by the table that holds the foreing key. Note that an index with same constraint name is created along with the foreign key’s triggers to improve DeleteForeignKeyTrigger performance.

7. NOTICE

As stated, the work of the LeanXcale Query Engine forked from Apache Calcite. For the drivers, Apache Avatica framework is also used:

Apache Calcite
Copyright 2012-2019 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

This product is based on source code originally developed
by DynamoBI Corporation, LucidEra Inc., SQLstream Inc. and others
under the auspices of the Eigenbase Foundation
and released as the LucidDB project.

The web site includes files generated by Jekyll.
Apache Calcite -- Avatica
Copyright 2012-2019 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

Geohash coding is based on https://github.com/kungfoo/geohash-java with the following license file:

Copyright 2016 Silvio Heuberger and contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

LeanXcale system uses spread as communication bus under the following license:

Version 1.0
June 26, 2001

Copyright (c) 1993-2016 Spread Concepts LLC. All rights reserved.

This product uses software developed by Spread Concepts LLC for use in the Spread toolkit.
For more information about Spread, see http://www.spread.org