Data Definition Language (DDL) Syntax

The SQL Data Definition Language (DDL) provides users with the capability to create tables, indexes, sequences, views, online aggregates, as well as databases, users, roles, and permissions. These are the supported statements:

ddlStatement:
	| createDatabaseStatement
	| dropDatabaseStatement
	| createSchemaStatement
	| dropSchemaStatement
	| createTableStatement
	| dropTableStatement
	| truncateTableStatement
	| alterTableStatement
	| createIndexStatement
	| dropIndexStatement
	| createViewStatement
	| dropViewStatement
	| createSequenceStatement
	| dropSequenceStatement
	| createOnlineAggregate
	| dropOnlineAggregate
	| createUserStatement
	| dropUserStatement
	| grantStatement
	| revokeStatement
	| createRoleStatement
	| dropRoleStatement
	| createTriggerStatement
	| dropTriggerStatement
	| enableTriggerStatement
	| disableTriggerStatement

1. Predefined Database, Schema, Admin User, Default Schemas

LeanXcale has a predefined database that is DB. LeanXcale has a predefined schema APP in all databases. The user LXADMIN is administrator of all schemas of all databases in the system. Each user has a schema associated that has the same name as the user. The default schema for LXADMIN is APP. The default schema for all other users is their own schema. When database objects are created without qualifying the name with the schema they are created in the default schema of the user.

2. CREATE DATABASE and DROP DATABASE Statements

The CREATE DATABASE allows to create a new database:

createDatabaseStatement=
	CREATE DATABASE databaseName [USER userName IDENTIFIED BY 'password']*

The user LXADMIN is administrator of all databases in the system. If the optional USER clause is provided, then a user userName is created within the new database and it becomes the owner of the database, that is, it has all permissions over the newly created database. Note that a connection is over a particular database. So after creating a database to be able to access it, it is needed to reconnect to the system with the new database.

These are two examples of use:

CREATE DATABASE MYDB

CREATE DATABASE MYDB2 USER USR IDENTIFIED BY 'foobar'

The first example creates the database without specifying the owner, so the owner by default will be LXADMIN. The second example creates the database with user USR as owner. This user is created as part of the database creation and made the owner of the database. Remember that owners are local to each database, so USR is a user only on MYDB2 database.

The DROP DATABASE enables to remove a database:

dropDatabaseStatement=
	DROP DATABASE databaseName

An example removing one of the databases we created before is:

DROP DATABASE MYDB

3. CREATE SCHEMA AUTHORIZATION Statement

The CREATE SCHEMA AUTHORIZATION enables to perform a batch of creation of tables and views, and grant permissions on a given schema:

createSchemaStatement=
	CREATE SCHEMA AUTHORIZATION schemaName
		createAndGrantStatement [createAndGrantStatement]*

createAndGrantStatement=
	createTableStatement
	| createViewStatement
	| grantStatement

Note that the schema should exist, since the statement only performs the specified batch of creates and grants over the existing schema. If one of the create statements fails, it will not continue processing the rest.

4. CREATE TABLE Statement

The CREATE TABLE command allows the creation of a table and follows the specified syntax as outlined below:

createTableStatement=
	CREATE [CACHE] [DELTA] TABLE
	[ IF NOT EXISTS ] name
	[ NO OLD VERSIONS ]
	[ '(' tableElement [, tableElement ]* ')' ]
	[ AS queryStatement ] [ partitionBySpec [partitionBySpec]*  [ distributeBySpec  ]* ]

4.1. Table Types

The purpose of table creation is to define the structure of the table, including column names, attributes, and data types. There are three types of tables: regular, cache, and delta.

Regular tables are created using CREATE TABLE. Cache tables (CREATE CACHE TABLE) are always stored in memory, in addition to disk, to ensure rapid access. Delta tables (CREATE DELTA TABLE) include delta columns, allowing the aggregation of values with conflict-free updates, such as a sum. However, it is recommended to use online aggregates that is a high level functionality. Delta tables requires to have certain knowledge of commutative concurrency control.

4.2. IF NOT EXISTS Clause

Attempting to create a table that already exists results in an error. To avoid this error, the IF NOT EXISTS clause can be used, ensuring that the table is only created if it does not already exist.

4.3. Old Versions

LeanXcale as most modern databases, employs Multi-Version Concurrency Control (MVCC). Under MVCC, updates are not executed in place; instead, a fresh version of the tuple is generated with each update. This mechanism enables efficient concurrency control by avoiding read-write conflicts.

In specific scenarios, when it is known that the application exclusively performs either data insertion or data reading, but not both concurrently, it is convenient to disable multi-versioning to improve performance. This action is undertaken to remove the associated overhead and increase the efficiency of insert and update operations within the LeanXcale database (i.e., all the overhead of concurrency control is avoided).

4.4. Column Names, Types, Constraints, Online Aggregates

The list of tableElements is where it is specified the column names and any attributes associated to them. A tableElement can be either a column specification or a constraint specification as can be seen in the following syntax:

tableElement=
	tableColumn
	| [ CONSTRAINT name ] tableConstraint

A tableColumn has the following syntax:

tableColumn=
	columnName [kindOfDeltaColumn] type [ NOT NULL ] [ columnGenerator ] [ PRIMARY KEY ]

In its most fundamental form, a column specification consists of a column name followed by its data type. The inclusion of the modifier NOT NULL imposes a constraint, indicating that the column cannot contain NULL values. Additionally, the PRIMARY KEY modifier designates that the column is a part of the primary key for the table.

Columns can also be automatically generated in three distinct ways. The first method involves providing a default value to be used when no explicit value is provided. The second method employs an auto-incrementing value, achieved through the use of the AS IDENTITY clause. The third method utilizes a delta column, wherein updates and inserts are treated as an accumulation function over the current value of the column. The first two ways are indicated in the columnGenerator syntax:

columnGenerator=
	DEFAULT expression
	| [ GENERATED ALWAYS ] AS IDENTITY
	[ START WITH initialValue ]
	[ INCREMENT BY incrementValue ]

The AS IDENTITY clause indicates that is an auto-increment key. It supports also the form GENERATED ALWAYS AS IDENTITY for compatibility with the PosgreSQL dialect. The START WITH subclause allows to specify the first value to be given to the auto-increment column. The INCREMENT BY subclause enables to indicate the value by which to increment the auto-increment column.

The third way is specified with the kindOfDeltaColumn syntax:

kindOfDeltaColumn =
	( SUM | MIN | MAX | COUNT )

It should be noted that in delta columns the value passed in the update, it is actually considered as a delta value to be aggregated with the current value of the column. The kind of delta column determines the aggregation to be applied with each update. For instance, when utilizing the SUM function, each update entails updating the SUM column with the previous value of the column augmented by the value provided in the update. MIN and MAX functions operate by obtaining the minimum or maximum, respectively, over the current value of the column and the new value introduced in the update. In the case of COUNT, it increments the previous value of the column by one, irrespective of value provided in the update.

Additionally, various constraints can be specified over the columns. This is the constraint syntax:

tableConstraint:
	PRIMARY KEY '(' columnName [, columnName ]* ')'
	| PRIMARY GEOHASH KEY '(' latitudeColumnName , longitudeColumnName ')'
	| PRIMARY GEOHASH KEY '(' geoColumnName ')'
	| FOREIGN KEY '(' columnName [, columnName ]* ')'
		REFERENCES tableName '(' columnName [, columnName ]* ')'
			[ VALIDATE | NOVALIDATE ]
	| UNIQUE '(' columnName [, columnName ]* ')'
	| GEOHASH '(' latitudeColumnName , longitudeColumnName ')'
	| GEOHASH '(' geoColumnName ')'
	| CHECK expression

The PRIMARY KEY constraint defines the columns that constitute the primary key of a table and establishes the order in which they are arranged.

The PRIMARY GEOHASH KEY constraint defines a geohash primary key. It can be defined in two ways. The first way is providing two FLOAT or DOUBLE columns with the latitude and longitude. The second way is providing a geo column that is a STRING column using WKT markup language representing a vector geometry object. The GEOHASH constraint generates a hidden STRING column with the geohash of the indicated column(s). This hidden column is the one used as primary key.

The FOREIGN KEY constraint indicates that one or more columns within a table refer to a set of columns in another table. Following the FOREIGN KEY keyword, the specific columns in the current table are listed. Subsequently, the REFERENCES keyword is used to identify the table name of the referenced table, along with the columns referenced in the table. The referenced columns should be unique, that is, either the primary key (or prefix) that is always unique, or a set of columns that are set as unique (or a prefix of them). A FOREIGN KEY constraint results in creating an index in the referencing table with the referencing columns. This index is used to check efficiently whether a row in the referenced table has referencing rows in the referencing table. The unique requirement on the referenced columns of the referenced table guarantees that when adding a row in the referencing table it can be efficiently checked that the referenced columns exist in the referenced table by using the index created for the unique constraint.

The GEOHASH constraint indicates what columns to use to generate a hidden STRING geohash column that is used to generate a secondary index over that column.ç This geohash secondary index enables to improve geo searches such as a point is contained in a geometric shape. A geohash column can also be used as part of a composed primary key, for instance, PRIMARY KEY (country, geoHashCol). In the example, one can do searches with country and a geo condition, and thus, a search will find the first row of a country, and then perform a geo search within the country.

The UNIQUE constraint mandates that the specified column or set of columns consistently maintain distinct values across various rows in the table. This constraint results in creating and index with the columns referred in the constraint.

The CHECK constraint ensures that the stipulated condition is met for any values inserted into the associated column.

4.5. AS Clause

The AS clause is used to define the table schema based on the result set of a query statement. The table is then populated with the rows from the result set. When you specify AS clause, you may omit the list of tableElement_s since they will be taken from the resultSet, or you can provide the columnName and omit the data type from any _tableElement, in which case it renames the underlying column.

4.6. PARTITION BY Clause

The PARTITION BY clause plays a pivotal role in determining the fragmentation strategy of a table within LeanXcale, a critical factor for optimizing operational efficiency. If no partitioning is specified for a table, it means all the workload of the table will be handled by a single storage server and therefore, at most one core will be used to process that workload. If partitioning is specified, one can also indicate the criteria to distribute the partitions. The PARTITION BY clause has a crucial role in defining the fragmentation strategy employed by a table, representing a critical element for optimizing operational efficiency. When partitioning is explicitly specified, it becomes possible to define the criteria to distribute the partitions. The PARTITION BY clause syntax is as follows:

partitionByClause =
PARTITION BY
	KEY `(` columnName [, columnName]* `)` AT  tupleLiteral [, tupleLiteral]*
	| DIMENSION  columnName
		AT  literal [ , literal ]*
		| EVERY intervalLiteral
			[ KEEP intervalLiteral]
			[ SINCE dateTimeLiteral ]
			[ UNTIL dateTimeLiteral ]
		| EVERY unsignedIntegerLiteral
			[KEEP unsignedIntegerLiteral]
			[ SINCE unsignedNumericLiteral ]
			[ UNTIL unsignedNumericLiteral ]
		| HASH `(` columnName [, columnName ]* `)`

tupleLiteral=
	`(` literal [ , literal ]* `)`

LeanXcale supports several methods for table partitioning that can be used in isolation or in a combined way:

Prefix of Primary Key: The table can be partitioned based on a prefix of its primary key.
Dimension-based Partitioning: Partitioning can be performed using a specified dimension, typically another column in the table. Each PARTITION BY DIMENSION clause only has a single column. Each clause adds a new dimension to the partitioning. The column should be part of the primary key. The columns of the primary key that are dimensions used for partitioning should be a suffix of the primary key (never including the first column of the primary, since this can be achieved with PARTITION BY KEY).
Auto-partitioning on Dimension Columns of temporal or auto-increment nature.
Hash partitioning: It can be specified over one or more columns of the primary key. It computes a hash over the specified columns and the value is stored in a new hidden column named HASHID. This column becomes the last column of the table and the last column of the primary key. There are as many partitions as storage servers.

Auto-partitioning can be applied to dimension columns of type TIMESTAMP, TIME, DATE, or an auto-increment integer (it could also be an integer coding the date as YYYYMMDD). This automatic partitioning is particularly crucial for historical tables where data spans over time, such as sales or time series. As tables, especially historical ones, grow in size, accessing them becomes increasingly slower. Auto-partitioning based on temporal or auto-increment fields ensures that table fragments remain manageable in size, optimizing access efficiency. In the context of historical tables, the field keeping the temporal column or auto-increment counter is the one that can be used for auto-partitioning. Since in historic tables, it is the last fragment the one being consistently accessed, while the other fragments are infrequently accessed, auto-partitioning ensures that only the actively used fragments consume resources, mitigating inefficiencies associated with accessing large historical datasets.

When using auto-partitioning with TIMESTAMP the time span should be specified with INTERVAL. Some examples are:

INTERVAL '1' YEAR.
INTERVAL '1' YEAR TO '6' MONTH.
INTERVAL '6' MONTH.
INTERVAL '1' MONTH.
INTERVAL '90' DAY.
INTERVAL '7' DAY.
INTERVAL '1' DAY.
INTERVAL '12' HOUR.
INTERVAL '90' SECONDS.
INTERVAL '15' SECONDS.

The utilization of the PARTITION BY KEY AT clause offers the capability to partition data based on a prefix of the primary key (including the whole key),

On the other hand, the PARTITION BY DIMENSION clause allows partitioning using any other column designated as a dimension in the CREATE TABLE statement, e.g., 'state'. This feature ensures that all rows with a specified range of values in the chosen dimension are grouped into the same partition.

Within the AT subclause, specific split points for partitioning are provided. In contrast, the EVERY subclause triggers automatic partitioning at specified intervals, streamlining the process of creating partitions based on the defined dimension.

When using the EVERY subclause for auto partitioning, sometimes one might insert data that is too far from the data used for current partitions. For instance, when using a timestamp one might be using timestamps from the last two years, but sometimes introduce a 0 value as timestamp. Since auto partitioned regions cannot have gaps to enable to compute fast in which region to store the data, for this reason when a gap is created LeanXcale fills the gap will all required regions. For instance, if one has rows with timestamps from 2023 and 2024 and then one introduces a row with a timestamp 0, then all regions between 0 and 2023 will be created that takes time and consumes disk space. To avoid this, one can specify ranges where gaps will not be filled with the clauses SINCE (meaning no auto split will happen before the specified value) and UNTIL (meaning no auto split will happen after the specified value). The dateTimeLiteral can be either a date with this form:

	DATE 'YYYY-MM-DD'

Or a timestamp:

	TIMESTAMP 'YYYY-MM-DD HH:MM:SS'

The timestamp can also have fractions of second.

4.7. DISTRIBUTE BY Clause

If partitions have been specified, then it is possible to indicate how to distribute the partitions across the storage servers with the distribute by clause. It has the following syntax:

distributeBySpec=
	NO DISTRIBUTE
	| DISTRIBUTE BY HASH
	| DISTRIBUTE BY KEY
	| DISTRIBUTE BY DIMENSION dim

The distribution of data can be achieved through one several criteria in LeanXcale:

NO DISTRIBUTE. It does not distribute the table, so it will be stored at a single storage server that will handle all the workload for that table.
DISTRIBUTE BY KEY. Utilizes the primary key partitions to distribute across storage servers. Cannot be combined with other DISTRIBUTE BY clauses.
DISTRIBUTE BY DIMENSION. Uses the specified partitioned dimension column to distribute.
DISTRIBUTE BY HASH. Uses the specified hashing to distribute. Cannot be combined with other DISTRIBUTE BY clauses.
Default: If no DISTRIBUTE clause is specified, automatic distribution is used, i.e., the system decides how to distribute the partitions.

A new column, HASHID, will be appended to the primary key that will be the last column of the primary key. For insert operations, the column HASHID should be filled with 0 and the database will fill it with the right value. It will create as many partitions as the number of storage engines in the database. It should be noted that DISTRIBUTE BY HASH cannot be combined with other distribution methods. If HASH partitioning has been specified no other distribution criteria is allowed.

The HASH partitioning and distribution method is straightforward to specify as it doesn’t require knowledge of data distribution on column ranges. However, it comes with a significant tradeoff - reads become significantly more expensive since they must be performed across all partitions of the table. There is an overhead associated to compute the hash, so only add the columns really necessary for an even partitioning. Also using less columns than necessary will result in an skewed partitioning.

When using DISTRIBUTE BY KEY, it should be noted that the partitions made by primary key will be distributed among the storage servers. If there are less primary key partitions than storage servers, then only a subset of the storage servers will be used for the table. It is recommended to have as many partitions as storage servers or a multiple of them.

When using DISTRIBUTE BY DIMENSION, it happens the same as with DISTRIBUTE BY KEY. Only the partitions that have been defined will be distributed among the storage servers. If there less partitions than storage servers, only a subset of them will be used. Again it is recommended that the number of partitions used for distribution are the same as storage servers.

If NO DISTRIBUTION is indicated, it means that all the table fragments will remain on a single storage server. It should be noted that even with several storage servers, the common case, one can distribute tables across storage servers, so no distribution is need on a per table basis. Having said so, in most cases the most effective way to distribute the workload is distributing all tables across all storage servers, this is especially true for large tables over which large queries are performed and tables with high loads of any other kind, either ingestion or queries.

If partitioning is specified but no explicit distribution method is defined, automatic distribution will be performed based on the following criteria: If hash partitioning is defined it will be used as distribute criterion. Otherwise, it distributes the partitions across storage servers.

For more in-depth details, refer to the section on writing efficiently in LeanXcale.

5. DROP TABLE and TRUNCATE TABLE Statements

The DROP TABLE statement facilitates the deletion of a specified table in SQL. It includes an optional IF EXISTS clause, which, when utilized, prevents the statement from generating an error if the table does not exist. The syntax is:

dropTableStatement=
	DROP TABLE [ IF EXISTS ] name

The TRUNCATE TABLE statement in SQL is used to quickly and efficiently remove all rows from a table. Unlike the DELETE statement, which removes rows one by one and generates individual row delete operations, TRUNCATE TABLE is a more direct and faster operation for removing all data from a table. It is syntax is:

truncateTableStatement=
	TRUNCATE TABLE name

6. ALTER TABLE Statement

The ALTER TABLE statement is used to modify the structure of an existing table. It allows you to add, modify, or drop columns, constraints, indexes, or perform other changes to the table definition. Its syntax is:

alterTableStatement=
      ALTER TABLE name alterTableAction

alterTableAction=
	RENAME TO newName
	| DROP COLUMN columnName
	| DROP constraintName
	| ADD COLUMN columnName columnType
	| ADD [ CONSTRAINT name ] tableConstraint
	| ADD PARTITION
		(
		KEY `(` columnName [, columnName]* `)` AT tupleLiteral [, tupleLiteral]*
		| DIMENSION columnName AT literal [, literal]*
		)
	| (ENABLE | DISABLE) CONSTRAINT name
	| (VALIDATE | NOVALIDATE) CONSTRAINT name

7. CREATE ONLINE AGGREGATE and DROP ONLINE AGGREGATE Statements

Online aggregates is a novel feature that LeanXcale provides. It can be thought as a materialized aggregation view with very efficient real-time update (it is updated as part of the transaction modifying the parent table). Basically, an online aggregate is a materialized aggregation query that is updated as part of the transaction updating the parent table, i.e., the table from which is derived. The syntax is as follows:

createOnlineAggregate:
	CREATE [DURABLE] ONLINE AGGREGATE
		[ IF NOT EXISTS ] [ NO OLD VERSIONS] aggregateName
		AS { * | projectItem [, projectItem ]* }
	FROM tableExpression
		[WHERE booleanExpression]
			[ GROUP BY { * | projectItem [, projectItem ]* } ]
				[ ( AUTOSPLIT columnName splitperiod [ AUTOREMOVE AFTER persistperiod ]
					| SPLIT columnName EVERY numericOrIntervalExpression [ AUTOREMOVE AFTER numericOrIntervalExpression ]

The optional "DURABLE" parameter serves to determine the persistence behavior of the online aggregate in relation to the parent table. When specified, the "DURABLE" attribute ensures that the online aggregate remains intact even if the parent table undergoes deletion. In contrast, when "DURABLE" is not explicitly indicated, the deletion of the parent table triggers a cascade deletion, leading to the removal of the associated online aggregate. The "IF NOT EXISTS" clause, analogous to its usage in the CREATE TABLE statement, does not result in an error if the online aggregate table already exists. This provision ensures that the creation process proceeds smoothly without interruption in the presence of a pre-existing online aggregate table. The "NO OLD VERSIONS" clause, akin to its counterpart in the CREATE TABLE statement, serves to deactivate multi-versioning. Consequently, only the latest version of each aggregate is retained, foregoing the retrieval of versions corresponding to transaction snapshots. The "AS" clause facilitates the specification of aggregates that the online aggregate will keep updated (such as SUMs, COUNTs, MINs, MAXs) and designates the columns from the parent table over which these aggregates are computed. Online aggregates, derived from a designated parent table indicated in the FROM clause, can be further filtered using the optional WHERE clause. This clause allows for setting a boolean condition, determining which rows from the parent table are considered by the online aggregate. The optional GROUP BY clause provides a means to specify the grouping criteria for computing the aggregates. If left unspecified, the aggregation encompasses all rows that satisfy the filter condition (if specified) or all rows if no filter condition is provided. Given that online aggregates are commonly defined over historical tables, using auto-splitting becomes imperative. The AUTOSPLIT clause may be specified over a temporal column or an auto-increment integer column. The optional AUTOREMOVE clause offers the ability to specify how data should be pruned after a certain period. It is should be noted that the intervals specified in "AUTOSPLIT" and "AUTOREMOVE" should align with those of the parent table or be a multiple thereof.

An online aggregate can be deleted with DROP ONLINE AGGREGATE:

dropOnlineAggregate=
	DROP ONLINE AGGREGATE onlineaggregateName

8. CREATE INDEX and DROP INDEX Statements

Secondary indexes play a pivotal role in optimizing access to columns that do not constitute a prefix of the primary key. The creation of these indexes is accomplished through the utilization of the CREATE INDEX statement, adhering to the subsequent syntax:

createIndexStatement=
CREATE [UNIQUE] INDEX [ IF NOT EXISTS ] indexName
	ON tableName '(' columnName [, columnName ]* ')'
		[INCLUDE '(' columnName [, columnName ]* ')']
		[GLOBAL [ partitionBySpec [ distributeBySpec ] ]

The optional "UNIQUE" clause serves to specify that only a singular value can exist for each combination of columns within the index. This uniqueness constraint ensures that the indexed columns collectively have distinct values.

The index is delineated over a specific table, as denoted by the ON clause. Following the specification of the tableName, a sequence of columns constituting the secondary key is enumerated within parentheses. These columns define the structure of the secondary index, facilitating efficient retrieval of data based on the specified column combinations.

The INCLUDE clause allows to defined covered indexes, that is, indexes that include additional columns. This is useful for optimizing queries over the secondary index. Basically, it enable to recover the value of the columns without having to perform a search on the primary key.

The GLOBAL clause indicates that the index will be global. Without the GLOBAL clause, indexes are local to each table fragment. That is, each table fragment has a fragment for the secondary index that only contains secondary keys for the tuples stored in the associated fragment of the primary table. Secondary indexes are convenient because they enable to read the tuple within the same storage server. However, there is an involved tradeoff. When searching with a secondary key, all storage servers containing fragment of the table are sent the query, with the consequent overhead. Instead, global indexes are like a table and they can be stored on any storage server. In fact, one can specify the partitioning and distribution strategy as for regular tables. Searches in global indexes only impact the storage servers containing a fragment that has secondary keys in the searched range. So only the storage servers containing relevant information are contacted. However, after recovering the primary keys of the tuples satisfying the query, they have to be searched on the storage servers containing them. The best option is to combine the global index with and INCLUDE clause including all the columns that can be used by the query. In this way only the storage servers containing secondary keys in the searched interval are contacted and the query is solved without a second search step. Basically global covered indexes (GLOBAL plus INCLUDE) as vertical partitions in columnar data warehouses.

To eliminate an index, the DROP INDEX statement is employed, using the following syntax:

dropIndexStatement=
	DROP INDEX [ IF EXISTS ] indexName

The statement begins with the DROP INDEX keywords, followed by an optional "IF EXISTS" clause, which ensures that the operation does not result in an error if the specified index does not exist. The indexName parameter designates the identifier of the index intended for removal. This process effectively dismantles the specified index, revoking its association with the corresponding table and freeing up resources.

9. CREATE VIEW and DROP VIEW Statements

A view is a virtual table derived from the result of a SELECT query. Unlike a physical table, a view does not store the data itself but provides a way to represent the result of a query as if it were a table. Views are useful for simplifying complex queries, abstracting the underlying data model, and controlling access to specific columns or rows of a table. It has the following syntax:

createViewStatement=
	CREATE VIEW [ IF NOT EXISTS ] viewName AS query

That creates a view named "viewName" from the result of the query provided in the AS clause.

A view can be removed by the DROP VIEW statement:

dropViewStatement:
      DROP VIEW [ IF EXISTS ] viewName

10. Sequences

A sequence is an object used to generate unique numeric values in a specified order. Sequences are commonly used to produce primary key values for tables, ensuring that each new record has a unique identifier. Unlike auto-increment columns, sequences provide greater flexibility and can be shared among multiple tables. Sequences use a BIGINT type (long integer). They are created by means of the CREATE SEQUENCE statement:

createSequenceStatement=
	CREATE SEQUENCE [ IF NOT EXISTS ] sequenceName sequenceModifier*

sequenceModifier=
	AS type
		|   START WITH initialValue
		|   INCREMENT BY incrementValue
		|   CACHE cacheSize
		|	NO FETCH

It should be noted that LeanXcale is a distributed database and might have multiple query engines and have multiple storage servers. A particular sequence will be handled by one storage server. The storage server writes each sequence value assigned to persistent storage. When a sequence is used intensely, one can regulate this behavior assigning a value to the CACHE modifier that will have the effect of writing persistently one out of CACHE times increasing significantly the efficiency. It should be noted in case of a crash of that storage server the sequence might have gaps in the following values after the recovery.

The client application interacts with a query engine. The query engine consumes a sequence value by invoking storage server in charge of that sequence. Since this interaction implies some overhead, the CACHE modifier can be used to make the query engine(s) fetch a number of values from the sequence to consume them locally. This means that in a multi-node configuration with multiple query engines each query engine will fetch a number of values. Thus, values of the sequence will not be necessarily consecutive along time. Thus, the CACHE modifier also affects the number of fetched values by each query engine. It is possible to prevent fetching to guarantee that values of the sequence are consecutive (that will be less efficient) by specifying the NO FETCH modifier.

The sequenceModifiers specify the behavior of the sequence. The START WITH clause indicates the first value to be given in the sequence. The INCREMENT BY clause sets the step size by which the sequence increases or decreases. The CACHE clause makes that query engine instance fetches a stripe of cacheSize length to assign sequence identifiers. In this way, the query engine does not have to request to a storage engine the next value of the sequence every time. Additionally, the storage server handling the sequence will persist the value of the sequence every CACHE times instead of every time. This is the preferred way of using sequences, with the CACHE option. The NO FETCH modifier disables the fetching in the query engine and guarantees that identifiers are consecutive (it has a negative performance impact).

One can consume the next sequence value by means of:

    next value for sequenceName

For instance, for a sequence called mySeq:

   select next value for mySeq;

will consume the next value of the sequence mySeq.

It is also possible to check the current sequence value (the last value consumed) of a sequence without fetching (with NO FETCH modifier) by:

   current value for seqName

For the previous example of mySeq sequence the current sequence value can obtained with:

   select current value for mySeq

Note that with fetching the concept of current value is fuzzy since each query engine will have a number of values of the sequence fetched and there is no notion of current sequence value.

A sequence can be removed with the DROP SEQUENCE statement:

   dropSequenceStatement:
   DROP SEQUENCE [ IF EXISTS ] sequenceName

11. CREATE USER, ALTER USER, and DROP USER Statements

New users can be created by means of the CREATE USER statement:

createUserStatement=
	CREATE USER userName IDENTIFIED BY 'password'

Leanxcale has two predefine users: LXADMIN and NONE. Users are case sensitive. When the are written without quotes they are always stored as capitalized. When they are written between quotes the capitalization within the quotes is used. Note that in the connection string of the drivers (JDBC, Python, ODBC) the user names are always case sensitive.

The password of a user can be changed with:

alterUserStatement=
	ALTER USER userName IDENTIFIED BY 'password'

A user can be removed by:

dropUserStatement=
	DROP USER userName

12. GRANT and REVOKE Statements

Permissions can be granted by:

grantStatement=
	GRANT permissionList
		[ON (TABLE | SEQUENCE | SCHEMA) name]
			TO userList
permissionList=
	permission [, permission]*

The grant is for all the database unless the clause ON is provided. If the clause ON is provided then the grant is local to the object provided, that is, the table, sequence or schema specified in the ON clause.

And revoked by:

revokeStatement=
	REVOKE permissionList
		[ON (TABLE | SEQUENCE | SCHEMA) name]
			TO userList

A permission can take three forms:

permission=
	action databaseObject
	| usrId
	| roleId

If it is a usrId or roleId the permission is actually the permission list associated to the user or role, otherwise, it is the permission to execute an action over a database object.

The supported actions are:

action=
	CREATE | ALTER | DROP | SELECT | USAGE | INSERT | DELETE | UPDATE | REFERENCES | COMMENT | BACKUP | ALL PRIVILEGES

The supported database objects are:

databaseObjects=
	TABLE | INDEX | VIEW | SEQUENCE | TRIGGER | PROCEDURE | ANY

13. CREATE ROLE and DROP ROLE Statements

A role is a collection of permissions. In that sense they are similar to a user. The syntax is:

createRoleStatement=
	CREATE ROLE roleName IDENTIFIED BY 'password'

Permissions can be granted to a role and a user can be granted roles that is equivalent to grant to the user each of the permissions in the role.

A role can be deleted with:

dropRoleStatement=
	DROP ROLE roleName

14. CREATE TRIGGER, DROP TRIGGER, ENABLE TRIGGER, and DISABLE TRIGGER Statements

Triggers are a predefined action that is automatically executed ("triggered") in response to certain events on a particular table or view. These events can include data modification statements (such as INSERT, UPDATE, DELETE) or specific system events. They are created by means of CREATE TRIGGER with the following syntax:

createTrigger=
	CREATE TRIGGER [ IF NOT EXISTS ] triggerName
		(BEFORE | AFTER)
		[INSERT] [UPDATE] [DELETE]
	ON tableName
	FOR EACH ROW EXECUTE triggerFunction ['(' stringLiteral ')']
	[PRIORITY intLiteral]

BEFORE and AFTER are used to indicate whether the trigger has to be executed before the triggering event or after. INSERT, UPDATE, and DELETE specify the triggering event. The ON clause specifies on which table the trigger is defined. triggerFunction LeanXcale currently only supports row level triggers, that is, triggers associated to row events. For this reason, the only form allowed is FOR EACH ROW EXECUTE that determines that is a row level trigger. triggerFunction is the name of a Java function that contains the code of the trigger. PRIORITY indicates the priority of the trigger, in case, there are several defined, they are executed in priority order.

Triggers can be removed with DROP TRIGGER:

dropTriggerStatement=
	DROP TRIGGER triggerName

Triggers can be enabled by means of:

enableTriggerStatement=
	ENABLE TRIGGER triggerName ON tableName

And disabled with:

disableTriggerStatement=
	DISABLE TRIGGER triggerName ON tableName

15. Auditing (Audit Logs)

In this section we describe how to activate and deactivate the different auditing levels.

15.1. Auditing Levels

Auditing is set for all users, unless a user is specified. LeanXcale allows to set the auditing level at database level, table level, and schema level.

For the level chosen then the auditing granularity can be set per session (session is the default granularity, if none specified), per transaction, or per operation. Session granularity means that accesses to a table is logged once per session, the first time the access happen in the session. Transaction granularity means that accesses are logged once per transaction, that is, the first access that happens to an audited table. *Operation granularity means that only accesses with a specific kind of operation will be audited: READ, WRITE, DDL, and ANY. In case all kinds of operations should be audited, then ANY can be used.

15.2. Auditing SQL Syntax

audit=
	[NO] AUDIT [USER userName] [ (TABLE tableName) | (SCHEMA schemaName) ]
		[ BY
		( SESSION
		| TRANSACTION
		| [ READ | WRITE | READ WRITE | DDL | ANY ]
		) ]

To activate audit use AUDIT and to deactivate audit use NO AUDIT. The auditing can be set for a particular user using the USER clause, or for all users if the USER clause is omitted.

To audit the full database to which you connected omit table and schema clauses, and to audit a full schema use the SCHEMA clause and to audit a particular table use the TABLE clause. Note that a tableName no qualified with the schema name is referring to the schema indicated in the connection. To refer to tables in another schema the table name has to be qualified with the schema name: schemaName.tableName. When no audit granularity is specified, that is, there is no BY clause, the default audit granularity will be SESSION.

The BY clause enables to specify the audit granularity as SESSION, TRANSACTION, or operation. In the last case, the operation kind to be audited is indicated, or ANY if all kinds of operations should be audited.

15.3. Audit Logs

The audit log file contains auditing entries, including user authentication. User authentication is always enabled and cannot be disabled. When there is no audit, one can still find the authentication audit records in the lxqe log. When the audit is on, authentication audit records will also appear in the audit log.

Logs are accessed via lx logs command. The command provides the -g option that performs a grep with the pattern specified after -g. For example, to locate authentication entries for the user lxadmin the flag -g (for grep) can be used as shown here:

	unix$ lx logs -g auth.*lxadmin audit
	logs...
	atlantis: [
		log/audit.240221.0708.log	1.75K [
		1:	2024-02-21 07:08:10 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		2:	2024-02-21 07:08:12 lxqe100: audit: auth: db-USR2: 127.0.0.1: local: yes
		3:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		4:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		5:	2024-02-21 07:08:13 lxqe100: audit: auth: lxadmin: 127.0.0.1: local: yes
		...
		]
	]

The user names include both the database and the user name, as a convenience. An exception to this rule is lxadmin that is a user across all databases, while the rest of the users are local to one database.

Authentication failures are reported as follows:

2024-02-29 06:40:52 lxqe100: audit: auth: db-FOO: 127.0.0.1:39018: local: no:auth failed

Another examples is:

	AUDIT TABLE MYTBL
	AUDIT USER USR1
	AUDIT USER USR1 TABLE MYTBL

And after trying to update the table MYTABL by USR1, the audit log file will include this audit record:

2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: /db/USR1/tbl/MYTBL

When setting auditing with operation granularity, the audit log will contain audit records like:

2024-02-29 06:40:51 lxqe100: audit: write: db-USR1: SELECT count(*) FROM MYTBL

When permission is denied to execute a statement, an audit record is added, if audit is enabled:

2024-02-29 06:40:51 lxqe100: audit: perm: db-TEST1: /db/APP/tbl/PERSONS: select on table: permission denied

Authentication audit records are as follows:

2024-02-16 08:45:08,752 INFO lxqe: authenticate: lxadmin: local: yes

report that the user lxadmin (or whoever it was) was authenticated by the local (i.e., the DB) user entry.

When using LDAP, the audit record shows the following:

2024-02-16 08:45:08,752 INFO lxqe: authenticate: db-USR1: ldap: yes

This reports both the database used (db) and the user involved (USR1).

Authentication errors are reported in a similar way:

2024-02-16 08:48:09,992 INFO lxqe: authenticate: db-USR2: ldap: no: [LDAP: error code 49 - Invalid Credentials]

SQL Data Manipulation Language (DML) Reference

This page describes LeanXcale’s SQL DML.

The SQL syntax is described using an EBNF-like form.

1. Isolation Levels

Database transactions provide the so-called ACID properties. The I stands for isolation and it takes care of the consistency of the data in the advent of concurrent transactions updating the same row(s). The higher the consistency level the most consistency is attained, but it is more costly to execute transactions. The levels from less to more consistency are (short names between parenthesis):

Loader: Mode to load data without any other parallel activity. It deactivates conflict checking and logging, a transaction gets as start timestamp the current snapshot. It is highly efficient for loading.
Raw_read_committed: (rawrc) This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. When the transaction commits its updates become visible, its updates become visible as they reach the involved storage servers, so it would be possible for another transaction to see only part of the updates made by a transaction. No conflicts are checked (writes over the same row by parallel transactions, transactions overlapping in time are allowed).
Read_committed (rc): This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. However, it allows non-repeatable reads, as a transaction may see different values for the same record within the same transaction if other transactions commit changes. When the transaction starts it waits for the snapshot to reach the commit timestamp of the last transaction committed in the session, and then it gets the snapshot as start timestamp.
Snapshot_isolation. It guarantees repeatable reads and no phantom reads. This isolation levels is called serializability in other database systems (incorrectly). Only write skew anomalies might happen. In case write skew might happen, it can be avoided by using select for update. The trasaction when started waits for the snapshot to be equal or higher than the commit timestamp of the last committed transaction in the session. Then it gets the current snapshot as start timestamp.

The default isolation mode is read_committed. The isolation mode is specified in the connection string in the driver with the property mode using either the long or short name of the isolation mode. The isolation mode can be changed via the SQL statement SET ISOLATION LEVEL described in Section 5

It should be noted that leanxcale provides fully optimistic concurrency control based on MVCC, this means, that in the advent of concurrent transactions (such that the start of one is between the start and commit of another) updating the same row results in aborting the second transaction performing the update. Other database systems in addition to MVCC has also locking. In those databases with snapshot isolation and serializability concurrent transactions modying the same row will also result in abortin the second transaction performing the update. However, in read committed mode, they don’t abort any of the transactions, just delay the second one till the first one is committed or rolled back. When a transaction is aborted due to a conflict the following exception is raised: … ConflictManagerException …

In case of rollback due to a conflict, the application should handle the exception and retry the transaction.

2. DML SQL Statements

The DML SQL statements supported are:

statement=
	queryStatement
	| upsertStatement
	| insertStatement
	| updateStatement
	| deleteStatement
	| mergeStatement
	| setIsolationLevelStatement
	| explainStatement
	| execStatement
	| batchStatement

Multiple SQL statements can be provided by separating them with semicolons:

statementList=
	statement; [statement';']*

3. Query Statement

A query statement can have the following forms:

queryStatement=
	( selectStatement
	| selectForUpdateStatement
	| unionStatement
	| intersectStatement
	| exceptStatement
	| intersectStatement
	| withStatement
	| valuesStatement )
		[ limitClause ]
		[ offsetClause ]
		[ fetchClause ]

3.1. SELECT Statement

The most common statement is the SELECT statement. The SELECT is accepted with FROM (the standard) and also without FROM to have compatibility with those SQL dialects that accept the SELECT without FROM:

selectStatement=
	( SELECT [ ALL | DISTINCT ]
		{ * | projectItem [, projectItem ]* }
	[ FROM tableExpression ]
	[ WHERE booleanExpression ]
	[ GROUP BY { groupItem [, groupItem ]* } ]
	[ HAVING booleanExpression ]
	[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]
	[ FOR UPDATE]
	[ ORDER BY item(, item)*]
	[ limitClause
	| OffsetClause [limitClause | FetchClause]
	| FetchClause ]

3.1.1. ALL/DISTINCT clause

Specifying the DISTINCT clause, filters out the same rows and returns only different (i.e., distinct) values in the resultset. The ALL clause returns all values, i.e., including repeated rows, it is the default behavior.

After the optional ALL/DISTINCT clause the list of columns to be projected is provided:

projectItem=
      expression [ [ AS ] columnAlias ]
  |   tableAlias . *

3.1.2. FROM Clause

In the optional FROM clause a tableExpression is provided that is basically the list of tables used in the select and how they should be joined (kind of join, join condition):

tableExpression=
	tableReference [, tableReference ]*
	| tableExpression [ NATURAL ]
	[ ( LEFT | RIGHT | FULL ) [ OUTER ] ]
		JOIN tableExpression [ joinCondition ]
	| tableExpression CROSS JOIN tableExpression

tableReference=
	tablePrimary
	[ matchRecognize ]
	[ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]

tablePrimary=
	[ [ catalogName . ] schemaName . ] tableName
		'(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'
	| [ LATERAL ] '(' queryStatement ')'
	| UNNEST '(' expression ')' [ WITH ORDINALITY ]
	| [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')'

joinCondition=
	ON booleanExpression
	| USING '(' column [, column ]* ')'

A SELECT without FROM is equivalent to VALUES (included for compatibility with PosgreSQL), as for example:

SELECT 1, CURRENT_TIMESTAMP;

This just yields the value 1 and the current timestamp in the database server.

3.1.3. WHERE Clause

The optional WHERE clause enables to provide a filter condition to select only the rows that fulfill the condition.

[ WHERE booleanExpression ]

3.1.4. GROUP BY and HAVING Clauses

The optional GROUP BY clause:

[ GROUP BY { groupItem [, groupItem ]* } ]
 	[ HAVING booleanExpression ]

enables to aggregate rows with the same values in a set of columns into summary rows.

groupItem=
	expression
	| '(' ')'
	| '(' expression [, expression ]* ')'
	| CUBE '(' expression [, expression ]* ')'
	| ROLLUP '(' expression [, expression ]* ')'
	| GROUPING SETS '(' groupItem [, groupItem ]* ')'

The optional HAVING clause the HAVING clause is used in conjunction with the GROUP BY clause to filter the results of a query based on aggregate functions applied to grouped data. The reason is because the WHERE keyword cannot be used with aggregate functions.

In the SELECT, HAVING and ORDER BY clauses of an aggregate query, all expressions must be constant within the current group (that is, grouping constants as defined by the GROUP BY clause, or constants), or aggregate functions, or a combination of constants and aggregate functions. Aggregate and grouping functions may only appear in an aggregate query, and only in a SELECT, HAVING or ORDER BY clause.

3.1.5. WINDOW Clause

The optional WINDOW clause:

[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]

is used in conjunction with window functions to define a window of rows over which the window function will operate. Window functions perform a calculation across a set of table rows related to the current row. The WINDOW clause allows you to specify the window frame or partition over which the window function operates:

window=
	windowName
	| windowSpec

windowSpec=
	'('
	[ windowName ]
	[ ORDER BY orderItem [, orderItem ]* ]
	[ PARTITION BY expression [, expression ]* ]
	[ RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
		| ROWS numericExpression { PRECEDING | FOLLOWING }
	]
	')'

3.1.6. ORDER BY Clause

The ORDER BY clause allows to sort the resultset by a set of columns and/or expressions:

orderByClause=
	ORDER BY expression [DESC | ASC] [NULLS (FIRST | LAST)] [, expression [DESC | ASC]]*

For each expression (it can be just a column name) one can optionally specify if the order is ascending (ASC) or descending (DESC). One can also optionally specify whether NULLS should appear before other values (NULLS FIRST) or after other values (NULLS LAST). By default, ordering is ascending and nulls appears at the end (i.e., ASC NULLS LAST for all expressions).

3.1.7. LIMIT Clause

The optional LIMIT clause is used to restrict the number of rows returned by a query. Additionally, an additional OFFSET clause can be used which allows to skip a certain number of rows before starting to return rows. The syntax is:

limitClause=
		[ LIMIT (count | ALL) ]

3.1.8. OFFSET Clause

The optional OFFSET clause:

offsetClause =
	[ OFFSET start ( ROW | ROWS ) ]

is used in conjunction with the LIMIT clause to specify the starting point within the result set for fetching rows. It allows you to skip a certain number of rows before beginning to return rows.

3.1.9. FETCH Clause

The optional FETCH clause:

fetchClause =
	[ FETCH ( FIRST | NEXT ) [ count ] ( ROW | ROWS ) ONLY ]

is used to retrieve a specified number of rows from the result set produced by a query. It is often used in conjunction with the OFFSET clause for pagination purposes, where you want to retrieve a specific subset of rows from a larger result set. The FIRST keyword indicates that you want to retrieve the first set of rows from the result set. The NEXT keyword indicates that you want to retrieve the next set of rows from the result set. The ONLY keyword is used in conjunction with the FETCH clause to specify that only the specified number of rows should be returned, and no additional rows should be fetched beyond that limit.

3.1.10. FOR UPDATE Clause

Snapshot isolation isolation level can suffer from write-skew anomaly in some scenarios, when an integrity constraint should be preserved among 2 or more rows, updating only a subset of them. An example of write skew is an operation that can withdraw funds from any two accounts for the same customer as far as the joint balance is positive (i.e. the sum bot both account balances is greater or equal than zero).

Assume that initially two accounts from the same customer with customerid 27 has a joint balance of 10:

+------------+---------+------------+
| ACCOUNTID | BALANCE | CUSTOMERID |
+------------+---------+------------+
| 1          | 3       | 27         |
| 2          | 7       | 27         |
+------------+---------+------------+

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

We have 2 concurrent transactions with isolation level set to snapsthot isolation and with AUTOCOMMIT set to OFF. The first transaction wants to withdraw from accountid 1 an amount of 5. It checks the current joint balance:

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

Since 5 ⇐ 10 it will proceed to subtract 5 from accountid 1. However, before the transaction commits, a second transaction wants to withdraw from accountid 2 an amount of 6. So it checks the current joint balance:

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

Since 6 ⇐ 10 it will proceed to subtract 6 from accountid 2.

Now the first transaction updates accountid 1 with an amount 3-5=-2 and commits:

UPDATE your_table_name
SET BALANCE = -2
WHERE ACCOUNTID = 1;
COMMIT;

Now the second transaction updates accountid 2 with an amount 5-6=-1 and commits:

UPDATE your_table_name
SET BALANCE = -1
WHERE ACCOUNTID = 2;
COMMIT;

The final result after committing both transactions is:

+------------+---------+------------+
| ACCOUNTID  | BALANCE | CUSTOMERID |
+------------+---------+------------+
| 1          | -1      | 27         |
| 2          | -2      | 27         |
+------------+---------+------------+

That violates the integrity constraint of keeping a non negative joint balance. The reason is that both transactions observe the same snapshot.

By using SELECT FOR UPDATE in both transactions, the first transaction doing the update over the pair of involved rows will succeed, while the second one will be aborted because the UPDATE of the first transaction over accountid 1 will conflict with the SELECT FOR UPDATE performed by the second transaction over accountid 1, and thus, resulting in its abortion.

SELECT FOR UPDATE has the same syntax as SELECT except using the clause FOR UPDATE at the end of the SELECT.

3.2. UNION, INTERSECT, and EXCEPT Statements

The union, intersect, and except statements can combine resultsets in different ways performing the set operations union, intersect and difference, respectively, and have the following syntax:

unionStatement=
 	queryStatement UNION [ ALL | DISTINCT ] queryStatement
intersectStatement=
	queryStatement INTERSECT [ ALL | DISTINCT ] queryStatement
exceptStatement=
	queryStatement EXCEPT [ ALL | DISTINCT ] queryStatement

MINUS is not supported but EXCEPT is equivalent to MINUS.

3.3. WITH Statement

The second form of a query has a with clause that has the following syntax:

withStatement=
	WITH withItem [, withItem]* queryStatement
withItem=
	name [ '(' column [, column ]* ')' ]
		AS '(' queryStatement ')'....

3.4. VALUES Statement

The VALUES statement basically defines a table literal with one or more rows, that is, it is a table value constructor. Its syntax is:

values=
	VALUES rowExpression [, rowExpression ]*

Where a rowExpression is specified as follows:

rowExpression=
	expression
	| [ROW] '(' expression [, expression]* ')'

3.5. Subqueries

A scalar sub-query is a sub-query used as an expression. If the sub-query returns no rows, the value is NULL; if it returns more than one row, it is an error. IN, EXISTS and scalar sub-queries can occur in any place where an expression can occur (such as the SELECT clause, WHERE clause, ON clause of a JOIN, or as an argument to an aggregate function).

An IN, EXISTS, or scalar sub-query may be correlated, that is, it may refer to tables in the FROM clause of an enclosing query.

4. UPSERT, INSERT, and UPDATE Statements

For modifying the database there are four kinds of statements allowing to upsert, insert, update, and delete rows from the database. Upsert updates or insert depending whether a row exists with the provided primary key or not. Insert adds a new row with that primary key, if there is no row with that primary key already in the table (otherwise it returns an error). Update modifies a row with a particular primary key, if there is a row in the table with that primary (otherwise it returns an error). Upsert is preferred over insert and update since it is the most efficient. Insert and update both require to perform an additional read to check whether the row exists or not, basically duplicating the cost of the statement. Their syntax is as follows:

upsertStatement=
	UPSERT INTO tablePrimary
		[ '(' column [, column ]* ')' ]
			queryStatement

insertStatement=
	INSERT INTO tablePrimary
		[ '(' column [, column ]* ')' ]
 			queryStatement

updateStatement=
 	UPDATE tablePrimary
 		SET assign [, assign ]*
 			[ WHERE booleanExpression ]

deleteStatement=
	DELETE FROM tablePrimary [ [ AS ] alias ]
		[ WHERE booleanExpression ]

In INSERT and UPSERT statements if the list of target columns is not provided, then the statement must have the same number of columns as the target table and the order of the fields in the VALUES clause must be the same as the order of the fields in the table.

5. SET ISOLATION LEVEL Statement

The current isolation level can be changed within the transaction using the SET ISOLATION LEVEL statement that has the following syntax:

setIsolationLevelStatement=
	SET ISOLATION LEVEL [TRANSACTION]
		LOADER | NONE
		| RAW READ COMMITTED | RRC | READ UNCOMMITTED
		| READ COMMITTED | RC
		| SNAPSHOT ISOLATION | SI | SERIALIZABLE | REPEATABLE READ

LeanXcale has multiversion concurrency control that provides highly efficient isolation management between reads and writes. The isolation levels were defined multiversion concurrency control was invented and are based on the notion of locking. LeanXcale supports 4 isolation levels:

LOADER (same as NONE). This mode is similar to the NONE standard mode in that it does not have neither concurrency control nor logging. It should be used for massive loads when nobody else is using the database, for instance, a night batch ingesting massive data.
RAW READ COMMITTED (same as RRC or READ UNCOMMITTED). This mode is a little more relaxed than READ COMMITTED and allows to read the most current version of any row. It is interesting for applications where one does not want to read from a consistent snapshot in the commit order, but instead the most recent value of the row. When setting READ UNCOMMITTED, this is the mode used that provides more isolation.
READ COMMITTED (same as RC). For each SQL statement it uses the most recent consistent snapshot (i.e., longest prefix of committed transactions in commit timestamp order). The equivalent in multiversion concurrency control of locking read committed.
SNAPSHOT ISOLATION (same as SI, SERIALIZABLE, REPEATABLE READ, SESSION, SESSION CONSISTENCY). The transaction at start time it gets the current snapshot and performs all reads within the transaction from this snapshot plus the updates performed by the transaction itself. It is roughly similar to SERIALIZABLE.

Note that the optional [TRANSACTION] keyword is just syntactic sugar for compatibility with other SQL dialects in which is mandatory to use it.

6. EXPLAIN Statement

The EXPLAIN statement information about the execution plan of a SQL query. The execution plan describes how the database engine will execute the query and retrieve the required data. This information can be helpful for database administrators, developers, and analysts to optimize and tune their queries for better performance:

explainStatement=
	EXPLAIN PLAN
		[ WITH TYPE | WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ]
		[ EXCLUDING ATTRIBUTES | INCLUDING [ ALL ] ATTRIBUTES ]
		[ AS JSON | AS XML ]
			FOR ( queryStatement | insertStatement | updateStatement | deleteStatement )

7. SQL Hints

A hint is an instruction to the optimizer. When writing SQL, you may know information about the data unknown to the optimizer. Hints enable you to force decisions otherwise made by the optimizer.

Planner enforcers: there’s no perfect planner, so it makes sense to implement hints to allow user better control the execution. For instance: “never merge this subquery with others” (/+ no_merge */). “treat those tables as leading ones” (/+ leading */) to affect join ordering, etc;
Append meta data/statistics: some statistics like “table index for scan” or “skew info of some shuffle keys” are somehow dynamic for the query, it would be very convenient to config them with hints because the planning metadata from the planner might not be sufficiently accurate.
Operator resource constraints: for many cases, a default resource configuration is assigned for the execution operators, i.e., min parallelism, memory (resource consuming UDF), special resource requirement (GPU or SSD disk), …

LeanXcale supports two kinds of hints:

Query Hint: right after the SELECT keyword;
Table Hint: right after the referenced table name.

This is the syntax:

query=
	SELECT /*+ hints */
		...
	FROM
		tableName /*+ hints */
	JOIN
		tableName /*+ hints */
		...

hints=
	hintItem[, hintItem ]*

hintItem=
	hintName
	| hintName(optionKey=optionVal[, optionKey=optionVal ]*)
	| hintName(hintOption [, hintOption ]*)

optionKey=
	simpleIdentifier
	| stringLiteral

optionVal=
	stringLiteral

hintOption=
	simpleIdentifier
	| numericLiteral
	| stringLiteral

8. EXEC Statement

The EXEC statement enables to invoke a table function. Its syntax is:

execStatement=
	EXEC '(' [ nonQueryExpression [, nonQueryExpression]* ] ')'

The parameters can be any expression that is not a query returning a value of the parameter type.

9. BATCH Statement

When auto-commit is set, and requesting an insert/update/delete operation from the result of a select statement with a large resultset, one might one instead of performing a large commit of the whole set of updates, a commit for smaller batches of updates. This can be performed with the BATCH statement that changes the implicit behavior of auto-commit. Instead of committing one time at the end of the statement, it will commit every number of indicate updates to commit every number of times indicated in the parameter of the BATCH statement:

batchStatement=
	BATCH unsignedIntLiteral   |   BATCH unsignedIntLiteral (sqlInsert | sqlUpdate | sqlDelete)

10. Identifiers

Identifiers are the names of tables, columns and other metadata elements used in an SQL query. Identifiers can be quoted or unquoted. Quoted identifiers, such as "Employee Name", start and end with double quotes. Unquoted identifiers, such as employee, must start with a letter and can only contain letters, digits, and underscores. They are implicitly converted to upper case. Identifiers are case insensitive when unquoted (actually, it is as they were written in CAPS), and case sensitive when quoted. Quoted identifiers may contain virtually any character, including spaces and other punctuation. If you wish to include a double quote in an identifier, use another double quote to escape it, as in this example: "An employee called ""Fred"".". Matching identifiers to the name of the referenced object is case-sensitive.

11. Data Types

11.1. Scalar Data Types

Data type Description Range and example literals Can be stored

Data type	Description	Range and example literals	Can be stored
BOOLEAN	Logical values	Values: TRUE, FALSE, UNKNOWN	Yes
SMALLINT	2 byte signed integer	Range is -32768 to 32767	Yes
INTEGER, INT	4 byte signed integer	Range is -2147483648 to 2147483647	Yes
BIGINT	8 byte signed integer	Range is -9223372036854775808 to 9223372036854775807	Yes
REAL, FLOAT	4 byte floating point	6 decimal digits precision	Yes
DOUBLE	8 byte floating point	15 decimal digits precision	Yes
DECIMAL(p, s)	Fixed point	Example: 123.45 is a DECIMAL(5, 2) value.	Yes
NUMERIC	Fixed point		Yes
CHAR(n), CHARACTER(n)	Fixed-width character string	‘Hello’, ‘’ (empty string), _latin1’Hello’, n’Hello’, _UTF16’Hello’, ‘Hello’ ‘there’ (literal split into multiple parts)	Yes
VARCHAR(n), CHARACTER VARYING(n)	Variable-length character string	As CHAR(n)	Yes
BINARY(n)	Fixed-width binary string	x’45F0AB’, x’’ (empty binary string), x’AB’ ‘CD’ (multi-part binary string literal)	Yes
VARBINARY(n), BINARY VARYING(n)	Variable-length binary string	As BINARY(n)	Yes
DATE	Date	Example: DATE ‘1969-07-20’	Yes
TIME	Time of day	Example: TIME ‘20:17:40’	Yes. This datatype has milliseconds precision
TIMESTAMP	Date and time	Example: TIMESTAMP ‘1969-07-20 20:17:40’	Yes. This datatype has microseconds precision
INTERVAL timeUnit [ TO timeUnit ]	Date time interval	Examples: INTERVAL ‘1-5’ YEAR TO MONTH, INTERVAL ‘45’ DAY, INTERVAL ‘1 2:34:56.789’ DAY TO SECOND	No
TIMESTAMP WITH TIME ZONE	Date and time with time zone	Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’	Yes. This datatype has milliseconds precision
TIMESTAMP WITH LOCAL TIME ZONE	Date and time with time zone	Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’	No
GEOMETRY	Geometry	Examples: ST_GeomFromText(‘POINT (30 10)’)	No
BIGINT ARRAY	Array of longs	`[1, 2, 3]`	Yes
DOUBLE ARRAY	Array of doubles	`[1.1, 2.2, 3.3]`	Yes
VARCHAR ARRAY	Array of strings	`['string1', 'string2']`	Yes
CIDR	IPv4 or IPv6 network specification	Example: cidr'128.1.2.0/24'	Yes

BOOLEAN

Logical values

Values: TRUE, FALSE, UNKNOWN

Yes

SMALLINT

2 byte signed integer

Range is -32768 to 32767

Yes

INTEGER, INT

4 byte signed integer

Range is -2147483648 to 2147483647

Yes

BIGINT

8 byte signed integer

Range is -9223372036854775808 to 9223372036854775807

Yes

REAL, FLOAT

4 byte floating point

6 decimal digits precision

Yes

DOUBLE

8 byte floating point

15 decimal digits precision

Yes

DECIMAL(p, s)

Fixed point

Example: 123.45 is a DECIMAL(5, 2) value.

Yes

NUMERIC

Fixed point

Yes

CHAR(n), CHARACTER(n)

Fixed-width character string

‘Hello’, ‘’ (empty string), _latin1’Hello’, n’Hello’, _UTF16’Hello’, ‘Hello’ ‘there’ (literal split into multiple parts)

Yes

VARCHAR(n), CHARACTER VARYING(n)

Variable-length character string

As CHAR(n)

Yes

BINARY(n)

Fixed-width binary string

x’45F0AB’, x’’ (empty binary string), x’AB’ ‘CD’ (multi-part binary string literal)

Yes

VARBINARY(n), BINARY VARYING(n)

Variable-length binary string

As BINARY(n)

Yes

DATE

Date

Example: DATE ‘1969-07-20’

Yes

TIME

Time of day

Example: TIME ‘20:17:40’

Yes. This datatype has milliseconds precision

TIMESTAMP

Date and time

Example: TIMESTAMP ‘1969-07-20 20:17:40’

Yes. This datatype has microseconds precision

INTERVAL timeUnit [ TO timeUnit ]

Date time interval

Examples: INTERVAL ‘1-5’ YEAR TO MONTH, INTERVAL ‘45’ DAY, INTERVAL ‘1 2:34:56.789’ DAY TO SECOND

TIMESTAMP WITH TIME ZONE

Date and time with time zone

Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’

Yes. This datatype has milliseconds precision

TIMESTAMP WITH LOCAL TIME ZONE

Date and time with time zone

Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’

GEOMETRY

Geometry

Examples: ST_GeomFromText(‘POINT (30 10)’)

BIGINT ARRAY

Array of longs

[1, 2, 3]

Yes

DOUBLE ARRAY

Array of doubles

[1.1, 2.2, 3.3]

Yes

VARCHAR ARRAY

Array of strings

['string1', 'string2']

Yes

CIDR

IPv4 or IPv6 network specification

Example: cidr'128.1.2.0/24'

Yes

11.2. Non Scalar Types

LeanXcale currently supports the following non scalar types:

Type	Description	Example literals	Can be stored
ARRAY	Ordered, contiguous collection that may contain duplicates	Example: varchar(10) array	Yes
MULTISET	Unordered collection that may contain duplicates	Example: int multiset	No
MAP	Collection of keys mapped to values		No
CURSOR	Cursor over the result of executing a query		No

Type

Description

Example literals

Can be stored

ARRAY

Ordered, contiguous collection that may contain duplicates

Example: varchar(10) array

Yes

MULTISET

Unordered collection that may contain duplicates

Example: int multiset

MAP

Collection of keys mapped to values

CURSOR

Cursor over the result of executing a query

11.2.1. Array Data Type

A sentence to create a table with array types could be the following:

CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR ARRAY, PRIMARY KEY(id));

Or the following, if you want to restrict the number of characters of each VARCHAR position in the VARCHAR ARRAY.

CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR (20) ARRAY, PRIMARY KEY(id));

An array literal has the following syntax:

arrayLiteral=
	ARRAY '[' value [, value]* ']'

If we query the above sample table TARRY, the result will be shown this way:

SELECT * FROM tarry
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| ID |                     LONGS                      |                    DOUBLES                     |         STRINGS         |
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| 1  | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] |
+----+------------------------------------------------+------------------------------------------------+-------------------------+

In order to insert a NULL value to an array field, can be done in two ways:

Replace the NULL keyword for the emptyList() function.

INSERT INTO tarry VALUES (2, ARRAY[2334589765239847563, 0, 44], emptyList(), ARRAY['CUENCA','BILBAO']);

Specify only the columns in which we want to insert values. In unlisted columns, NULL values will be inserted.

INSERT INTO tarry (id, longs, strings) VALUES (2, array[2334589765239847563, 0, 44], ARRAY['CUENCA','BILBAO']);

The result will be:

+----+------------------------------------------------+------------------------------------------------+-------------------------+
| ID |                     LONGS                      |                    DOUBLES                     |         STRINGS         |
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| 1  | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] |
| 2  | [2334589765239847563, 0, 44]                   |                                                | [CUENCA, BILBAO]        |
+----+------------------------------------------------+------------------------------------------------+-------------------------+

11.2.2. Non Scalar Value Constructors

The next table summarizes the constructors for non scalar values:

Operator syntax	Description
ROW (value [, value ]*)	Creates a row from a list of values.
(value [, value ]* )	Creates a row from a list of values.
MAP ‘[’ key ‘]’	Returns the element of a map with a particular key.
ARRAY ‘[’ index ‘]’	Returns the element at a particular location in an array.
ARRAY ‘[’ value [, value ]* ‘]’	Creates an array from a list of values.
MAP ‘[’ key, value [, key, value ]* ‘]’	Creates a map from a list of key-value pairs.

Operator syntax

Description

ROW (value [, value ]*)

Creates a row from a list of values.

(value [, value ]* )

Creates a row from a list of values.

MAP ‘[’ key ‘]’

Returns the element of a map with a particular key.

ARRAY ‘[’ index ‘]’

Returns the element at a particular location in an array.

ARRAY ‘[’ value [, value ]* ‘]’

Creates an array from a list of values.

MAP ‘[’ key, value [, key, value ]* ‘]’

Creates a map from a list of key-value pairs.

11.2.3. Non Scalar Functions

The supported functions for non scalar types are shown in the below table:

Operator syntax	Description
ELEMENT(value)	Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element.
CARDINALITY(value)	Returns the number of elements in an array or multiset.
value MEMBER OF multiset	Returns whether the value is a member of multiset.
multiset IS A SET	Whether multiset is a set (has no duplicates).
multiset IS NOT A SET	Whether multiset is not a set (has duplicates).
multiset IS EMPTY	Whether multiset contains zero elements.
multiset IS NOT EMPTY	Whether multiset contains one or more elements.
multiset SUBMULTISET OF multiset2	Whether multiset is a submultiset of multiset2.
multiset NOT SUBMULTISET OF multiset2	Whether multiset is not a submultiset of multiset2.
multiset MULTISET UNION [ ALL \| DISTINCT ] multiset2	Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).
multiset MULTISET INTERSECT [ ALL \| DISTINCT ] multiset2	Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).
multiset MULTISET EXCEPT [ ALL \| DISTINCT ] multiset2	Returns the difference of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

Operator syntax

Description

ELEMENT(value)

Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element.

CARDINALITY(value)

Returns the number of elements in an array or multiset.

value MEMBER OF multiset

Returns whether the value is a member of multiset.

multiset IS A SET

Whether multiset is a set (has no duplicates).

multiset IS NOT A SET

Whether multiset is not a set (has duplicates).

multiset IS EMPTY

Whether multiset contains zero elements.

multiset IS NOT EMPTY

Whether multiset contains one or more elements.

multiset SUBMULTISET OF multiset2

Whether multiset is a submultiset of multiset2.

multiset NOT SUBMULTISET OF multiset2

Whether multiset is not a submultiset of multiset2.

multiset MULTISET UNION [ ALL | DISTINCT ] multiset2

Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET INTERSECT [ ALL | DISTINCT ] multiset2

Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET EXCEPT [ ALL | DISTINCT ] multiset2

Returns the difference of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

Note: the UNNEST relational operator converts a collection to a relation.

12. Type Conversion

LeanXcale supports both implicit and explicit conversion in expressions.

12.1. Implicit and Explicit Type Conversion

It is recommended to specify explicit conversions, rather than implicit conversions, for these reasons:

SQL statements are easier to understand when you use explicit datatype conversion functions.
Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.
Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR value may return an unexpected format.

12.2. Explicit Type Conversion

Explicit type conversion is attained with the CAST function:

Operator syntax	Description
CAST(value AS type)	Converts a value to type.

Operator syntax

Description

CAST(value AS type)

Converts a value to type.

The supported types in the CAST function are:

type=
	typeName
		[ collectionsTypeName ]*

typeName=
	sqlTypeName
	| rowTypeName
 	| compoundIdentifier

sqlTypeName=
	CHAR [ precision ] [ charSet ]
	| VARCHAR [ precision ] [ charSet ]
	| DATE
	| TIME
	| TIMESTAMP
	| TIMESTAMP WITH TIME ZONE
	| CIDR
	| GEOMETRY
	| DECIMAL [ precision [, scale] ]
	| BOOLEAN
	| TINYINT
	| SMALLINT
	| INTEGER
	| BIGINT
	| BINARY [ precision ]
	| VARBINARY [ precision ]
	| REAL
	| DOUBLE
	| FLOAT
	| ANY [ precision [, scale] ]

collectionsTypeName=
	ARRAY | MULTISET

rowTypeName=
	ROW '('
		fieldName1 fieldType1 [ NULL | NOT NULL ]
		[ , fieldName2 fieldType2 [ NULL | NOT NULL ] ]*
	')'

char=
	CHARACTER | CHAR

varchar=
	char VARYING | VARCHAR

decimal=
	DECIMAL | DEC | NUMERIC

integer:
	INTEGER | INT

varbinary=
	BINARY VARYING | VARBINARY

double=
	DOUBLE [ PRECISION ]

time=
	TIME [ precision ] [ timeZone ]

timestamp=
	TIMESTAMP [ precision ] [ timeZone ]

charSet=
	CHARACTER SET charSetName

timeZone=
	WITHOUT TIME ZONE
	| WITH LOCAL TIME ZONE

12.3. Implicit Type Conversion

refers to the automatic conversion of one data type to another by the database system without requiring explicit instructions from the programmer. This conversion occurs during the execution of SQL statements or expressions. The following table shows all possible conversions, without regard to the context in which it is made. The following convention is used:

i: implicit cast
e: explicit cast
x: not allowed

FROM - TO	NULL	BOOLEAN	TINYINT	SMALLINT	INT	BIGINT	DECIMAL	FLOAT or REAL	DOUBLE	INTERVAL	DATE	TIME	TIMESTAMP	TIMESTAMP WITH TIME ZONE	CHAR or VARCHAR
BINARY or VARBINARY	CIDR	NULL	i	i	i	i	i	i	i	i	i	i	i	i	i
i	i	i	x	BOOLEAN	x	i	e	e	e	e	e	e	e	x	x
x	x	x	i	x	x	TINYINT	x	e	i	i	i	i	i	i	i
e	x	x	e	e	i	x	x	SMALLINT	x	e	i	i	i	i	i
i	i	e	x	x	e	e	i	x	x	INT	x	e	i	i	i
i	i	i	i	e	x	x	e	e	i	x	x	BIGINT	x	e	i
i	i	i	i	i	i	e	x	x	e	e	i	x	x	DECIMAL	x
e	i	i	i	i	i	i	i	e	x	x	e	e	i	x	x
FLOAT/REAL	x	e	i	i	i	i	i	i	i	x	x	x	e	e	i
x	x	DOUBLE	x	e	i	i	i	i	i	i	i	x	x	x	e
e	i	x	x	INTERVAL	x	x	e	e	e	e	e	x	x	i	x
x	x	e	x	x	DATE	x	x	x	x	x	x	x	x	x	x
i	x	i	e	i	x	x	TIME	x	x	x	x	x	x	x	x
x	x	x	i	e	e	i	x	x	TIMESTAMP	x	x	e	e	e	e
e	e	e	x	i	e	i	i	i	x	TIMESTAMP WITH TIME ZONE	x	x	e	e	e
e	e	e	e	x	i	e	i	i	i	x	x	CHAR or VARCHAR	x	e	i
i	i	i	i	i	i	i	i	i	i	i	i	i	i	BINARY or VARBINARY	x
x	x	x	x	x	x	x	x	x	e	e	e	x	i	i	x
CIDR	x	x	x	x	x	x	x	x	x	x	x	x	x	x	i

FROM - TO

NULL

BOOLEAN

TINYINT

SMALLINT

INT

BIGINT

DECIMAL

FLOAT or REAL

DOUBLE

INTERVAL

DATE

TIME

TIMESTAMP

TIMESTAMP WITH TIME ZONE

CHAR or VARCHAR

BINARY or VARBINARY

CIDR

NULL

BOOLEAN

TINYINT

SMALLINT

INT

BIGINT

DECIMAL

FLOAT/REAL

DOUBLE

INTERVAL

DATE

TIME

TIMESTAMP

TIMESTAMP WITH TIME ZONE

CHAR or VARCHAR

BINARY or VARBINARY

CIDR

12.3.1. Conversion Contexts and Strategies

Set operation (UNION, EXCEPT, INTERSECT): Analyze the data types of each row in every branch and determine the common type for each pair of fields
Binary arithmetic expression (+, -, &, ^, /, %): promote string operand to data type of the other numeric operand;
Binary comparison (=, <, ⇐, <>, >, >=): if operands are STRING and TIMESTAMP, promote to TIMESTAMP; make 1 = true and 0 = false always evaluate to TRUE. If there is numeric type operand, find common type for both operands.
IN sub-query: compare type of left hand side and right hand side, and find the common type. If it is a non scalar type, find the widest type for every field.
IN expression list: compare every expression to find the common type.
CASE WHEN expression or COALESCE: find the common widest type of the THEN and ELSE operands.
Character + INTERVAL or character - INTERVAL: promote character to TIMESTAMP.
Built-in function: look up the type families registered in the checker, find the family default type if checker rules allow it.
User-defined function (UDF): coerce based on the declared argument types of the eval() method.
INSERT and UPDATE: coerce a source field to counterpart target table field’s type if the two fields differ with type name or precision(scale).

Note:

Implicit type coercion of the following cases is ignored:

One of the type is ANY;
Type coercion within CHARACTER types are always ignored, i.e. from CHAR(20) to VARCHAR(30);
Type coercion from a numeric to another with higher precedence is ignored, i.e. from INT to LONG.

12.3.2. Strategies for Finding Common Type

If the operator has the expected data types, just take them as the desired one, e.g., the UDF would have eval() method which has reflection argument types.
If there is no expected data type but the data type families are registered, try to coerce the arguments to the family’s default data type, i.e. the String family will have a VARCHAR type.
If neither expected data type nor families are specified, try to find the tightest common type of the node types, i.e., INTEGER and DOUBLE will return DOUBLE, the numeric precision does not lose for this case.
If no tightest common type is found, try to find a wider type, i.e. VARCHAR and INTEGER will return INTEGER, we allow some precision loss when widening decimal to fractional, or promote to VARCHAR type.

13. Operators and Functions

The following sections provides a summary of all operators and functions supported by LeanXcale.

13.1. Comparison Operators

The following table provides the list of supported comparison operators:

Operator syntax	Description
value1 = value2	Equals
value1 <> value2	Not equal
value1 > value2	Greater than
value1 >= value2	Greater than or equal
value1 < value2	Less than
value1 <= value2	Less than or equal
value IS NULL	Whether value is null
value IS NOT NULL	Whether value is not null
value1 IS DISTINCT FROM value2	Whether two values are not equal, treating null values as the same
value1 IS NOT DISTINCT FROM value2	Whether two values are equal, treating null values as the same
value1 BETWEEN value2 AND value3	(value1 >= value2) AND (value1 <= value3)
value1 NOT BETWEEN value2 AND value3	Whether value1 is less than value2 or greater than value3
string1 LIKE patternString [ ESCAPE string3 ]	Whether string1 matches pattern patternString
string1 NOT LIKE patternString [ ESCAPE string3 ]	Whether string1 does not match pattern patternString
string1 SIMILAR TO patternString [ ESCAPE string3 ]	Whether string1 matches regular expression patternString
string1 NOT SIMILAR TO patternString [ ESCAPE string3 ]	Whether string1 does not match regular expression patternString
value IN (value [, value]*)	Whether value is equal to a value in a list
value NOT IN (value [, value]*)	Whether value is not equal to any value in a list
value IN (sub-query)	Whether value is equal to a row returned by sub-query
value NOT IN (sub-query)	Whether value is not equal to any row returned by sub-query
value comparison SOME (sub-query)	Whether value fulfills the comparison for at least one row returned by sub-query
value comparison ANY (sub-query)	Synonym for SOME
value comparison ALL (sub-query)	Whether value fulfills the comparison for every row returned by sub-query
EXISTS (sub-query)	Whether sub-query returns at least one row

Operator syntax

Description

value1 = value2

Equals

value1 <> value2

Not equal

value1 > value2

Greater than

value1 >= value2

Greater than or equal

value1 < value2

Less than

value1 <= value2

Less than or equal

value IS NULL

Whether value is null

value IS NOT NULL

Whether value is not null

value1 IS DISTINCT FROM value2

Whether two values are not equal, treating null values as the same

value1 IS NOT DISTINCT FROM value2

Whether two values are equal, treating null values as the same

value1 BETWEEN value2 AND value3

(value1 >= value2) AND (value1 <= value3)

value1 NOT BETWEEN value2 AND value3

Whether value1 is less than value2 or greater than value3

string1 LIKE patternString [ ESCAPE string3 ]

Whether string1 matches pattern patternString

string1 NOT LIKE patternString [ ESCAPE string3 ]

Whether string1 does not match pattern patternString

string1 SIMILAR TO patternString [ ESCAPE string3 ]

Whether string1 matches regular expression patternString

string1 NOT SIMILAR TO patternString [ ESCAPE string3 ]

Whether string1 does not match regular expression patternString

value IN (value [, value]*)

Whether value is equal to a value in a list

value NOT IN (value [, value]*)

Whether value is not equal to any value in a list

value IN (sub-query)

Whether value is equal to a row returned by sub-query

value NOT IN (sub-query)

Whether value is not equal to any row returned by sub-query

value comparison SOME (sub-query)

Whether value fulfills the comparison for at least one row returned by sub-query

value comparison ANY (sub-query)

Synonym for SOME

value comparison ALL (sub-query)

Whether value fulfills the comparison for every row returned by sub-query

EXISTS (sub-query)

Whether sub-query returns at least one row

13.2. Regular Expressions

LIKE and SIMILAR TO operators allow to perform pattern matching using regular expressions. You can use the special characters ('%','_') and you can also use regular expressions with a syntax similar to perl-like regular expressions. Specifically, regular expressions may contain.

char or char sequence: A single character or a sequence of characters that matches itself
'.': The dot is a wildcard that matches any single character except for a newline
'^': Matches the start of a line
'[a-z]', Represents a character class. In this example, it matches any lowercase letter from 'a' to 'z'
'[^a-z]': Represents a negated character class. It matches any character that is not a lowercase letter from 'a' to 'z'
'\a': Represents a predefined character class for alphanumeric characters. It includes letters (upper and lower case) and digits
'\u': Represents a predefined character class for uppercase letters
'\l': Represents a predefined character class for lowercase letters
'\b': Represents a predefined character class for whitespace characters, such as space and tab
'\w': Represents a predefined character class for word characters. It includes letters, digits, and underscore

They may also contain the following operators (from lowest to highest precedence):

'(' and ')': for grouping
'|': to provide alternatives
concatenation: implicit, no operator
'*': zero or more times
'+': one or more times
'?': zero or one time

It should be noted that the main difference between LIKE and SIMILAR TO is that LIKE implies the expression will start with the text in the expression while SIMILAR doesn’t imply '^'.

13.3. Logical Operators

The table below outlines the logical operators:

Operator syntax	Description
boolean1 OR boolean2	Whether boolean1 is TRUE or boolean2 is TRUE
boolean1 AND boolean2	Whether boolean1 and boolean2 are both TRUE
NOT boolean	Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN
boolean IS FALSE	Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN
boolean IS NOT FALSE	Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN
boolean IS TRUE	Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN
boolean IS NOT TRUE	Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN
boolean IS UNKNOWN	Whether boolean is UNKNOWN
boolean IS NOT UNKNOWN	Whether boolean is not UNKNOWN

Operator syntax

Description

boolean1 OR boolean2

Whether boolean1 is TRUE or boolean2 is TRUE

boolean1 AND boolean2

Whether boolean1 and boolean2 are both TRUE

NOT boolean

Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN

boolean IS FALSE

Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN

boolean IS NOT FALSE

Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN

boolean IS TRUE

Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN

boolean IS NOT TRUE

Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN

boolean IS UNKNOWN

Whether boolean is UNKNOWN

boolean IS NOT UNKNOWN

Whether boolean is not UNKNOWN

13.4. Mathematical Operators and Functions

This table provides an overview of arithmetic operators and functions:

Operator syntax	Description
+ numeric	Returns numeric
- numeric	Returns negative numeric
numeric1 + numeric2	Returns numeric1 plus numeric2
numeric1 - numeric2	Returns numeric1 minus numeric2
numeric1 * numeric2	Returns numeric1 multiplied by numeric2
numeric1 / numeric2	Returns numeric1 divided by numeric2
POWER(numeric1, numeric2)	Returns numeric1 raised to the power of numeric2
ABS(numeric)	Returns the absolute value of numeric
MOD(numeric1, numeric2)	Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative
SQRT(numeric)	Returns the square root of numeric
LN(numeric)	Returns the natural logarithm (base e) of numeric
LOG2(numeric)	Returns the base 2 logarithm of numeric
LOG10(numeric)	Returns the base 10 logarithm of numeric
LOG(numeric1 [, numeric2])	Returns the base numeric2 (default is 10) logarithm of numeric1
EXP(numeric)	Returns e raised to the power of numeric
CEIL(numeric)	Rounds numeric up, returning the smallest integer that is greater than or equal to numeric
FLOOR(numeric)	Rounds numeric down, returning the largest integer that is less than or equal to numeric
RANDOM()	Generates a random double between 0 and 1 inclusive
RANDOM_INT(numeric)	Generates a random integer between 0 and numeric-1 inclusive
RANDOM_SEED(bigint)	Seeds the random generator; valid within the current session
ACOS(numeric)	Returns the arc cosine of numeric
ACOSD(numeric)	Returns the arc cosine of numeric in degrees
ACOSH(numeric)	Returns the hyperbolic arc cosine of numeric
ASIN(numeric)	Returns the arc sine of numeric
ASIND(numeric)	Returns the arc sine of numeric in degrees
ASINH(numeric)	Returns the hyperbolic arc sine of numeric
ATAN(numeric)	Returns the arc tangent of numeric
ATAND(numeric)	Returns the arc tangent of numeric in degrees
ATANH(numeric)	Returns the hyperbolic arc tangent of numeric
ATAN2(numeric, numeric)	Returns the arc tangent of the numeric coordinates
ATAN2D(numeric, numeric)	Returns the arc tangent of the numeric coordinates in degrees
CBRT(numeric)	Returns the cube root of numeric
COS(numeric)	Returns the cosine of numeric
COSD(numeric)	Returns the cosine of numeric; argument in degrees
COSH(numeric)	Returns the hyperbolic cosine of numeric
COT(numeric)	Returns the cotangent of numeric
COTD(numeric)	Returns the cotangent of numeric; argument in degrees
DEGREES(numeric)	Converts numeric from radians to degrees
E()	Returns e
FACTORIAL(bigint)	Returns the factorial of bigint
GCD(bigint1, bigint2)	Returns the greatest common divisor of bigint1 and bigint2
LCM(bigint1, bigint2)	Returns the least common multiple of bigint1 and bigint2
PI()	Returns a value that is closer than any other value to pi
RADIANS(numeric)	Converts numeric from degrees to radians
ROUND(numeric1 [, numeric2])	Rounds numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point
SIGN(numeric)	Returns the signum of numeric
SIN(numeric)	Returns the sine of numeric
SIND(numeric)	Returns the sine of numeric; argument in degrees
SINH(numeric)	Returns the hyperbolic sine of numeric
TAN(numeric)	Returns the tangent of numeric
TAND(numeric)	Returns the tangent of numeric; argument in degrees
TANH(numeric)	Returns the hyperbolic tangent of numeric
TRUNCATE(numeric1 [, numeric2])	Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

Operator syntax

Description

+ numeric

Returns numeric

- numeric

Returns negative numeric

numeric1 + numeric2

Returns numeric1 plus numeric2

numeric1 - numeric2

Returns numeric1 minus numeric2

numeric1 * numeric2

Returns numeric1 multiplied by numeric2

numeric1 / numeric2

Returns numeric1 divided by numeric2

POWER(numeric1, numeric2)

Returns numeric1 raised to the power of numeric2

ABS(numeric)

Returns the absolute value of numeric

MOD(numeric1, numeric2)

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

SQRT(numeric)

Returns the square root of numeric

LN(numeric)

Returns the natural logarithm (base e) of numeric

LOG2(numeric)

Returns the base 2 logarithm of numeric

LOG10(numeric)

Returns the base 10 logarithm of numeric

LOG(numeric1 [, numeric2])

Returns the base numeric2 (default is 10) logarithm of numeric1

EXP(numeric)

Returns e raised to the power of numeric

CEIL(numeric)

Rounds numeric up, returning the smallest integer that is greater than or equal to numeric

FLOOR(numeric)

Rounds numeric down, returning the largest integer that is less than or equal to numeric

RANDOM()

Generates a random double between 0 and 1 inclusive

RANDOM_INT(numeric)

Generates a random integer between 0 and numeric-1 inclusive

RANDOM_SEED(bigint)

Seeds the random generator; valid within the current session

ACOS(numeric)

Returns the arc cosine of numeric

ACOSD(numeric)

Returns the arc cosine of numeric in degrees

ACOSH(numeric)

Returns the hyperbolic arc cosine of numeric

ASIN(numeric)

Returns the arc sine of numeric

ASIND(numeric)

Returns the arc sine of numeric in degrees

ASINH(numeric)

Returns the hyperbolic arc sine of numeric

ATAN(numeric)

Returns the arc tangent of numeric

ATAND(numeric)

Returns the arc tangent of numeric in degrees

ATANH(numeric)

Returns the hyperbolic arc tangent of numeric

ATAN2(numeric, numeric)

Returns the arc tangent of the numeric coordinates

ATAN2D(numeric, numeric)

Returns the arc tangent of the numeric coordinates in degrees

CBRT(numeric)

Returns the cube root of numeric

COS(numeric)

Returns the cosine of numeric

COSD(numeric)

Returns the cosine of numeric; argument in degrees

COSH(numeric)

Returns the hyperbolic cosine of numeric

COT(numeric)

Returns the cotangent of numeric

COTD(numeric)

Returns the cotangent of numeric; argument in degrees

DEGREES(numeric)

Converts numeric from radians to degrees

E()

Returns e

FACTORIAL(bigint)

Returns the factorial of bigint

GCD(bigint1, bigint2)

Returns the greatest common divisor of bigint1 and bigint2

LCM(bigint1, bigint2)

Returns the least common multiple of bigint1 and bigint2

PI()

Returns a value that is closer than any other value to pi

RADIANS(numeric)

Converts numeric from degrees to radians

ROUND(numeric1 [, numeric2])

Rounds numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

SIGN(numeric)

Returns the signum of numeric

SIN(numeric)

Returns the sine of numeric

SIND(numeric)

Returns the sine of numeric; argument in degrees

SINH(numeric)

Returns the hyperbolic sine of numeric

TAN(numeric)

Returns the tangent of numeric

TAND(numeric)

Returns the tangent of numeric; argument in degrees

TANH(numeric)

Returns the hyperbolic tangent of numeric

TRUNCATE(numeric1 [, numeric2])

Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

13.5. String Operators and Functions

A summary of string operators and functions is depicted in the below table:

Operator syntax	Description
string \|\| string	Concatenates two character strings
CONCAT_WS(string0, string1, string2 [, …])	Concatenates two or more character strings using string0 as delimiter
CHAR_LENGTH(string)	Returns the number of characters in a character string
CHARACTER_LENGTH(string)	As CHAR_LENGTH(string)
UPPER(string)	Returns a character string converted to upper case
LOWER(string)	Returns a character string converted to lower case
POSITION(string1 IN string2)	Returns the position of the first occurrence of string1 in string2
POSITION(string1 IN string2 FROM integer)	Returns the position of the first occurrence of string1 in string2 starting at a given point (not standard SQL)
TRIM( { BOTH \| LEADING \| TRAILING } string1 FROM string2)	Removes the longest string containing only the characters in string1 from the start/end/both ends of string1
OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])	Replaces a substring of string1 with string2
SUBSTRING(string FROM integer)	Returns a substring of a character string starting at a given point
SUBSTRING(string FROM integer FOR integer)	Returns a substring of a character string starting at a given point with a given length
INITCAP(string)	Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.
LEFT(string, integer)	Returns the first integer characters of string
RIGHT(string, integer)	Returns the last integer characters of string
LPAD(string1, integer [, string2])	Fill up string1 to length integer by prepending the characters string2 (a space by default)
RPAD(string1, integer [, string2])	Fill up string1 to length integer by appending the characters string2 (a space by default)
REVERSE(string)	Returns string with characters in reverse order
REPEAT(string, integer)	Returns string repeated integer number of times
REPLACE(string1, string2, string3)	Replace all occurrences in string1 of substring string2 with substring string3
SPLIT(string1, string2)	Split string1 using string2 as delimiter and return the resulting array of strings
SPLIT_PART(string1, string2, integer)	Split string1 using string2 as delimiter and return the element with index integer (counting from one)
STARTS_WITH(string1, string2 [, boolean])	Checks whether string1 starts with string2 as prefix; boolean sets case sensitivity (default is true)
ENDS_WITH(string1, string2 [, boolean])	Checks whether string1 ends with string2 as suffix; boolean sets case sensitivity (default is true)
CONTAINS_SUBSTR(string1, string2 [, boolean])	Checks whether string1 contains string2; boolean sets case sensitivity (default is true)
REGEXP_LIKE(string1, string2)	Checks whether string1 matches the regular expression string2
REGEXP_REPLACE(string1, string2, string3)	Replaces in string1 each substring that matches the regular expression string2 with string3

Operator syntax

Description

string || string

Concatenates two character strings

CONCAT_WS(string0, string1, string2 [, …])

Concatenates two or more character strings using string0 as delimiter

CHAR_LENGTH(string)

Returns the number of characters in a character string

CHARACTER_LENGTH(string)

As CHAR_LENGTH(string)

UPPER(string)

Returns a character string converted to upper case

LOWER(string)

Returns a character string converted to lower case

POSITION(string1 IN string2)

Returns the position of the first occurrence of string1 in string2

POSITION(string1 IN string2 FROM integer)

Returns the position of the first occurrence of string1 in string2 starting at a given point (not standard SQL)

TRIM( { BOTH | LEADING | TRAILING } string1 FROM string2)

Removes the longest string containing only the characters in string1 from the start/end/both ends of string1

OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])

Replaces a substring of string1 with string2

SUBSTRING(string FROM integer)

Returns a substring of a character string starting at a given point

SUBSTRING(string FROM integer FOR integer)

Returns a substring of a character string starting at a given point with a given length

INITCAP(string)

Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

LEFT(string, integer)

Returns the first integer characters of string

RIGHT(string, integer)

Returns the last integer characters of string

LPAD(string1, integer [, string2])

Fill up string1 to length integer by prepending the characters string2 (a space by default)

RPAD(string1, integer [, string2])

Fill up string1 to length integer by appending the characters string2 (a space by default)

REVERSE(string)

Returns string with characters in reverse order

REPEAT(string, integer)

Returns string repeated integer number of times

REPLACE(string1, string2, string3)

Replace all occurrences in string1 of substring string2 with substring string3

SPLIT(string1, string2)

Split string1 using string2 as delimiter and return the resulting array of strings

SPLIT_PART(string1, string2, integer)

Split string1 using string2 as delimiter and return the element with index integer (counting from one)

STARTS_WITH(string1, string2 [, boolean])

Checks whether string1 starts with string2 as prefix; boolean sets case sensitivity (default is true)

ENDS_WITH(string1, string2 [, boolean])

Checks whether string1 ends with string2 as suffix; boolean sets case sensitivity (default is true)

CONTAINS_SUBSTR(string1, string2 [, boolean])

Checks whether string1 contains string2; boolean sets case sensitivity (default is true)

REGEXP_LIKE(string1, string2)

Checks whether string1 matches the regular expression string2

REGEXP_REPLACE(string1, string2, string3)

Replaces in string1 each substring that matches the regular expression string2 with string3

13.6. Binary String Operators and Functions

The following table provides a summary of binary string operators and functions:

Operator syntax	Description
binary \|\| binary	Concatenates two binary strings
POSITION(binary1 IN binary2)	Returns the position of the first occurrence of binary1 in binary2
POSITION(binary1 IN binary2 FROM integer)	Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL)
OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ])	Replaces a substring of binary1 with binary2
SUBSTRING(binary FROM integer)	Returns a substring of binary starting at a given point
SUBSTRING(binary FROM integer FOR integer)	Returns a substring of binary starting at a given point with a given length

Operator syntax

Description

binary || binary

Concatenates two binary strings

POSITION(binary1 IN binary2)

Returns the position of the first occurrence of binary1 in binary2

POSITION(binary1 IN binary2 FROM integer)

Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL)

OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ])

Replaces a substring of binary1 with binary2

SUBSTRING(binary FROM integer)

Returns a substring of binary starting at a given point

SUBSTRING(binary FROM integer FOR integer)

Returns a substring of binary starting at a given point with a given length

13.7. Operator Precedence

The following table summarizes the operator precedence and associativity, from highest to lowest.

Operator	Associativity
.	left
::	left
[ ] (array element)	left
+ - (unary plus, minus)	right
* / %
	left
+ -	left
BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS etc.	-
< > = ⇐ >= <> !=	left
IS NULL, IS FALSE, IS NOT TRUE etc.	-
NOT	right
AND	left
OR	left

Operator

Associativity

left

[ ] (array element)

left

+ - (unary plus, minus)

right

* / %

left

+ -

left

BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS etc.

< > = ⇐ >= <> !=

left

IS NULL, IS FALSE, IS NOT TRUE etc.

NOT

right

AND

left

13.8. Network Functions

The below table provides a quick reference of Network Functions:

Operator syntax	Description
network_contains(cidr1, cidr2)	Check whether cidr1 contains cidr2
network_is_contained_by(cidr1, cidr2)	Check whether cidr1 is contained by cidr2
network_family(cidr)	Returns cidr’s network family
network_masklen(cidr)	Returns cidr’s mask length
network_broadcast(cidr)	Returns the broadcast address for the given cidr
network_netmask(cidr)	Returns the net mask address for the given cidr
network_hostmask(cidr)	Returns the host mask address for the given cidr
network_network(cidr)	Returns the network address for the given cidr
network_host(cidr)	Extract the address for the given cidr

Operator syntax

Description

network_contains(cidr1, cidr2)

Check whether cidr1 contains cidr2

network_is_contained_by(cidr1, cidr2)

Check whether cidr1 is contained by cidr2

network_family(cidr)

Returns cidr’s network family

network_masklen(cidr)

Returns cidr’s mask length

network_broadcast(cidr)

Returns the broadcast address for the given cidr

network_netmask(cidr)

Returns the net mask address for the given cidr

network_hostmask(cidr)

Returns the host mask address for the given cidr

network_network(cidr)

Returns the network address for the given cidr

network_host(cidr)

Extract the address for the given cidr

13.9. Date and Time Functions

The below table provides a quick reference of Data and Time Functions:

Operator syntax Description

Operator syntax	Description
LOCALTIME	Returns the current date and time in the session time zone in a value of datatype TIME
LOCALTIME(precision)	Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision
LOCALTIMESTAMP	Returns the current date and time in the session time zone in a value of datatype TIMESTAMP
LOCALTIMESTAMP(precision)	Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with precision digits of precision
CURRENT_TIME	Returns the current time in the session time zone, in a value of datatype TIMESTAMP
CURRENT_DATE	Returns the current date in the session time zone, in a value of datatype DATE
CURRENT_TIMESTAMP	Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP
EXTRACT(timeUnit FROM datetime)	Extracts and returns the value of a specified datetime field from a datetime value expression
FLOOR(datetime TO timeUnit)	Rounds datetime down to timeUnit
CEIL(datetime TO timeUnit)	Rounds datetime up to timeUnit
YEAR(date)	Equivalent to `EXTRACT(YEAR FROM date)`. Returns an integer.
QUARTER(date)	Equivalent to `EXTRACT(QUARTER FROM date)`. Returns an integer between 1 and 4.
MONTH(date)	Equivalent to `EXTRACT(MONTH FROM date)`. Returns an integer between 1 and 12.
WEEK(date)	Equivalent to `EXTRACT(WEEK FROM date)`. Returns an integer between 1 and 53.
DAYOFYEAR(date)	Equivalent to `EXTRACT(DOY FROM date)`. Returns an integer between 1 and 366.
DAYOFMONTH(date)	Equivalent to `EXTRACT(DAY FROM date)`. Returns an integer between 1 and 31.
DAYOFWEEK(date)	Equivalent to `EXTRACT(DOW FROM date)`. Returns an integer between 1 and 7.
HOUR(date)	Equivalent to `EXTRACT(HOUR FROM date)`. Returns an integer between 0 and 23.
MINUTE(date)	Equivalent to `EXTRACT(MINUTE FROM date)`. Returns an integer between 0 and 59.
SECOND(date)	Equivalent to `EXTRACT(SECOND FROM date)`. Returns an integer between 0 and 59.
TIMESTAMPADD(timeUnit, integer, datetime)	Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to `datetime + INTERVAL 'integer' timeUnit`
TIMESTAMPDIFF(timeUnit, datetime, datetime2)	Returns the (signed) number of timeUnit intervals between datetime and datetime2.
LAST_DAY(date)	Returns the date of the last day of the month in a value of datatype DATE; For example, it returns DATE’2020-02-29’ for both DATE’2020-02-10’ and TIMESTAMP’2020-02-10 10:10:10’
CURRENT_MILLIS()	Returns the number of milliseconds since the epoch
CURRENT_SECONDS()	Returns the number of seconds since the epoch
TO_TIMESTAMP_MILLIS(numeric)	Returns the timestamp that is numeric number of milliseconds since the epoch
TO_TIMESTAMP_SECONDS(numeric)	Returns the timestamp that is numeric number of seconds since the epoch
TO_EPOCH_MILLIS(timestamp)	Returns the number of milliseconds from the epoch to timestamp
TO_EPOCH_SECONDS(timestamp)	Returns the number of seconds from the epoch to timestamp

LOCALTIME

Returns the current date and time in the session time zone in a value of datatype TIME

LOCALTIME(precision)

Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision

LOCALTIMESTAMP

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP

LOCALTIMESTAMP(precision)

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with precision digits of precision

CURRENT_TIME

Returns the current time in the session time zone, in a value of datatype TIMESTAMP

CURRENT_DATE

Returns the current date in the session time zone, in a value of datatype DATE

CURRENT_TIMESTAMP

Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP

EXTRACT(timeUnit FROM datetime)

Extracts and returns the value of a specified datetime field from a datetime value expression

FLOOR(datetime TO timeUnit)

Rounds datetime down to timeUnit

CEIL(datetime TO timeUnit)

Rounds datetime up to timeUnit

YEAR(date)

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

QUARTER(date)

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

MONTH(date)

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

WEEK(date)

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

DAYOFYEAR(date)

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

DAYOFMONTH(date)

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

DAYOFWEEK(date)

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

HOUR(date)

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

MINUTE(date)

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

SECOND(date)

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

TIMESTAMPADD(timeUnit, integer, datetime)

Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to datetime + INTERVAL 'integer' timeUnit

TIMESTAMPDIFF(timeUnit, datetime, datetime2)

Returns the (signed) number of timeUnit intervals between datetime and datetime2.

LAST_DAY(date)

Returns the date of the last day of the month in a value of datatype DATE; For example, it returns DATE’2020-02-29’ for both DATE’2020-02-10’ and TIMESTAMP’2020-02-10 10:10:10’

CURRENT_MILLIS()

Returns the number of milliseconds since the epoch

CURRENT_SECONDS()

Returns the number of seconds since the epoch

TO_TIMESTAMP_MILLIS(numeric)

Returns the timestamp that is numeric number of milliseconds since the epoch

TO_TIMESTAMP_SECONDS(numeric)

Returns the timestamp that is numeric number of seconds since the epoch

TO_EPOCH_MILLIS(timestamp)

Returns the number of milliseconds from the epoch to timestamp

TO_EPOCH_SECONDS(timestamp)

Returns the number of seconds from the epoch to timestamp

Where:

timeUnit=
	MILLENNIUM | CENTURY | DECADE | YEAR
	| QUARTER | MONTH | WEEK | DOY | DOW | DAY
	| HOUR | MINUTE | SECOND | EPOCH

Note:

As stated above, by default, DATE, TIME and TIMESTAMP have no time zone. For those types, there is no implicit time zone, such as UTC (as in Java) or the local time zone. It is left to the user or application to supply a time zone. In turn, TIMESTAMP WITH LOCAL TIME ZONE does not store the time zone internally, times will be stored in UTC and it will rely on the supplied time zone to provide correct semantics.
Interval literals may only use time units YEAR, MONTH, DAY, HOUR, MINUTE and SECOND.

====Conditional Functions and Operators

You can find an overview of conditional functions and operators in the following table:

Operator syntax Description

Operator syntax	Description
CASE value WHEN value1 [, value11 ]* THEN result1 [ WHEN valueN [, valueN1 ]* THEN resultN ]* [ ELSE resultZ ] END	Enumerated values
CASE WHEN condition1 THEN result1 [ WHEN conditionN THEN resultN ]* [ ELSE resultZ ] END	Enumerated conditions
NULLIF(value, value)	Returns NULL if the values are the same. For example, `NULLIF(5, 5)` returns NULL; `NULLIF(5, 0)` returns 5.
COALESCE(value, value [, value ]*)	returns the value of the first expression that is not NULL, or NULL otherwise For example, `COALESCE(NULL, 5)` returns 5.

CASE value
WHEN value1 [, value11 ]* THEN result1
[ WHEN valueN [, valueN1 ]* THEN resultN ]*
[ ELSE resultZ ]
END

Enumerated values

CASE
WHEN condition1 THEN result1
[ WHEN conditionN THEN resultN ]*
[ ELSE resultZ ]
END

Enumerated conditions

NULLIF(value, value)

Returns NULL if the values are the same.

For example, NULLIF(5, 5) returns NULL; NULLIF(5, 0) returns 5.

COALESCE(value, value [, value ]*)

returns the value of the first expression that is not NULL, or NULL otherwise

For example, COALESCE(NULL, 5) returns 5.

13.10. Period Functions

A period is defined by two points in time and can be done in the following ways:

period=
	(datetime, datetime)
	| (datetime, interval)
	| PERIOD (datetime, datetime)
	| PERIOD (datetime, interval)

There are a number of supported functions over periods:

Operator syntax	Description
period1 CONTAINS datetime dt	Checks if the time period represented by period1 contains the specified datetime 'dt'.
period1 CONTAINS period2	Verifies if the time period represented by period1 contains the entire time period period2.
period1 OVERLAPS period2	Determines if there is any overlap between the time periods represented by period1 and period2. If there is any common time interval, the condition is true.
period1 EQUALS period2	Checks if the time period represented by period1 is equal to the time period period2 in its entirety.
period1 PRECEDES period2	Validates whether the time period represented by period1 comes before (precedes) the time period period2. It doesn’t necessarily mean immediate succession.
period1 IMMEDIATELY PRECEDES period2	Checks if the time period represented by period1 immediately precedes the time period period2 without any gap in between.
period1 SUCCEEDS period2	Verifies whether the time period represented by period1 comes after (succeeds) the time period period2. Similar to PRECEDES, it doesn’t imply immediate succession.
period1 IMMEDIATELY SUCCEEDS period2	Checks if the time period represented by period1 immediately succeeds the time period period2 without any gap.

Operator syntax

Description

period1 CONTAINS datetime dt

Checks if the time period represented by period1 contains the specified datetime 'dt'.

period1 CONTAINS period2

Verifies if the time period represented by period1 contains the entire time period period2.

period1 OVERLAPS period2

Determines if there is any overlap between the time periods represented by period1 and period2. If there is any common time interval, the condition is true.

period1 EQUALS period2

Checks if the time period represented by period1 is equal to the time period period2 in its entirety.

period1 PRECEDES period2

Validates whether the time period represented by period1 comes before (precedes) the time period period2. It doesn’t necessarily mean immediate succession.

period1 IMMEDIATELY PRECEDES period2

Checks if the time period represented by period1 immediately precedes the time period period2 without any gap in between.

period1 SUCCEEDS period2

Verifies whether the time period represented by period1 comes after (succeeds) the time period period2. Similar to PRECEDES, it doesn’t imply immediate succession.

period1 IMMEDIATELY SUCCEEDS period2

Checks if the time period represented by period1 immediately succeeds the time period period2 without any gap.

Where period1 and period2 are period expressions.

13.11. Aggregate functions

Syntax:

aggregateCall=
	agg( [ ALL | DISTINCT ] value [, value ]*)
		[ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
		[ FILTER (WHERE condition) ]
	| agg(*) [ FILTER (WHERE condition) ]

where agg is one of the operators in the following table, or a user-defined aggregate function.

If FILTER is present, the aggregate function only considers rows for which condition evaluates to TRUE.

If DISTINCT is present, duplicate argument values are eliminated before being passed to the aggregate function.

If WITHIN GROUP is present, the aggregate function sorts the input rows according to the ORDER BY clause inside WITHIN GROUP before aggregating values. WITHIN GROUP is only allowed for hypothetical set functions (RANK, DENSE_RANK, PERCENT_RANK and CUME_DIST), inverse distribution functions (PERCENTILE_CONT and PERCENTILE_DISC) and collection functions (COLLECT and LISTAGG).

In orderItem, if expression is a positive integer n, it denotes the nth item in the SELECT clause. As an example:

SELECT F1, F2, F3, F4 FROM T ORDER BY 1,2

This will order by F1 and then F2 which are the first and second in the SELECT clause.

Operator syntax Description

Operator syntax	Description
COLLECT( [ ALL \| DISTINCT ] value)	Returns a multiset of the values
LISTAGG( [ ALL \| DISTINCT ] value [, separator])	Returns values concatenated into a string, delimited by separator (default ‘,’)
COUNT( [ ALL \| DISTINCT ] value [, value ]*)	Returns the number of input rows for which value is not null (wholly not null if value is composite)
COUNT(*)	Returns the number of input rows
FUSION(multiset)	Returns the multiset union of multiset across all input values
APPROX_COUNT_DISTINCT(value [, value ]*)	Returns the approximate number of distinct values of value; the database is allowed to use an approximation but is not required to
AVG( [ ALL \| DISTINCT ] numeric)	Returns the average (arithmetic mean) of numeric across all input values
SUM( [ ALL \| DISTINCT ] numeric)	Returns the sum of numeric across all input values
MAX( [ ALL \| DISTINCT ] value)	Returns the maximum value of value across all input values
MIN( [ ALL \| DISTINCT ] value)	Returns the minimum value of value across all input values
ANY_VALUE( [ ALL \| DISTINCT ] value)	Returns one of the values of value across all input values; this is NOT specified in the SQL standard
BIT_AND( [ ALL \| DISTINCT ] value)	Returns the bitwise AND of all non-null input values, or null if none
BIT_OR( [ ALL \| DISTINCT ] value)	Returns the bitwise OR of all non-null input values, or null if none
BIT_XOR( [ ALL \| DISTINCT ] value)	Returns the bitwise XOR of all non-null input values, or null if none
STDDEV_POP( [ ALL \| DISTINCT ] numeric)	Returns the population standard deviation of numeric across all input values
STDDEV_SAMP( [ ALL \| DISTINCT ] numeric)	Returns the sample standard deviation of numeric across all input values
STDDEV( [ ALL \| DISTINCT ] numeric)	Synonym for `STDDEV_SAMP`
VAR_POP( [ ALL \| DISTINCT ] value)	Returns the population variance (square of the population standard deviation) of numeric across all input values
VAR_SAMP( [ ALL \| DISTINCT ] numeric)	Returns the sample variance (square of the sample standard deviation) of numeric across all input values
COVAR_POP(numeric1, numeric2)	Returns the population covariance of the pair (numeric1, numeric2) across all input values
COVAR_SAMP(numeric1, numeric2)	Returns the sample covariance of the pair (numeric1, numeric2) across all input values
REGR_COUNT(numeric1, numeric2)	Returns the number of rows where both dependent and independent expressions are not null
REGR_SXX(numeric1, numeric2)	Returns the sum of squares of the dependent expression in a linear regression model
REGR_SYY(numeric1, numeric2)	Returns the sum of squares of the independent expression in a linear regression model
FIRST_IN_ORDER(column1, column2)	Returns the first value of column1 as ordered by column2 within the aggregate group
LAST_IN_ORDER(column1, column2)	Returns the last value of column1 as ordered by column2 within the aggregate group

COLLECT( [ ALL | DISTINCT ] value)

Returns a multiset of the values

LISTAGG( [ ALL | DISTINCT ] value [, separator])

Returns values concatenated into a string, delimited by separator (default ‘,’)

COUNT( [ ALL | DISTINCT ] value [, value ]*)

Returns the number of input rows for which value is not null (wholly not null if value is composite)

COUNT(*)

Returns the number of input rows

FUSION(multiset)

Returns the multiset union of multiset across all input values

APPROX_COUNT_DISTINCT(value [, value ]*)

Returns the approximate number of distinct values of value; the database is allowed to use an approximation but is not required to

AVG( [ ALL | DISTINCT ] numeric)

Returns the average (arithmetic mean) of numeric across all input values

SUM( [ ALL | DISTINCT ] numeric)

Returns the sum of numeric across all input values

MAX( [ ALL | DISTINCT ] value)

Returns the maximum value of value across all input values

MIN( [ ALL | DISTINCT ] value)

Returns the minimum value of value across all input values

ANY_VALUE( [ ALL | DISTINCT ] value)

Returns one of the values of value across all input values; this is NOT specified in the SQL standard

BIT_AND( [ ALL | DISTINCT ] value)

Returns the bitwise AND of all non-null input values, or null if none

BIT_OR( [ ALL | DISTINCT ] value)

Returns the bitwise OR of all non-null input values, or null if none

BIT_XOR( [ ALL | DISTINCT ] value)

Returns the bitwise XOR of all non-null input values, or null if none

STDDEV_POP( [ ALL | DISTINCT ] numeric)

Returns the population standard deviation of numeric across all input values

STDDEV_SAMP( [ ALL | DISTINCT ] numeric)

Returns the sample standard deviation of numeric across all input values

STDDEV( [ ALL | DISTINCT ] numeric)

Synonym for STDDEV_SAMP

VAR_POP( [ ALL | DISTINCT ] value)

Returns the population variance (square of the population standard deviation) of numeric across all input values

VAR_SAMP( [ ALL | DISTINCT ] numeric)

Returns the sample variance (square of the sample standard deviation) of numeric across all input values

COVAR_POP(numeric1, numeric2)

Returns the population covariance of the pair (numeric1, numeric2) across all input values

COVAR_SAMP(numeric1, numeric2)

Returns the sample covariance of the pair (numeric1, numeric2) across all input values

REGR_COUNT(numeric1, numeric2)

Returns the number of rows where both dependent and independent expressions are not null

REGR_SXX(numeric1, numeric2)

Returns the sum of squares of the dependent expression in a linear regression model

REGR_SYY(numeric1, numeric2)

Returns the sum of squares of the independent expression in a linear regression model

FIRST_IN_ORDER(column1, column2)

Returns the first value of column1 as ordered by column2 within the aggregate group

LAST_IN_ORDER(column1, column2)

Returns the last value of column1 as ordered by column2 within the aggregate group

13.12. Window Functions

Window functions are a category of SQL functions that operate on a specific "window" or subset of rows within a result set. These functions provide a way to perform calculations across a range of rows that are related to the current row. The syntax is as follows:

windowedAggregateCall=
	agg( [ ALL | DISTINCT ] value [, value ]*)
		[ RESPECT NULLS | IGNORE NULLS ]
		[ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
		[ FILTER (WHERE condition) ]
			OVER window
	| agg(*)
		[ FILTER (WHERE condition) ]
		OVER window

where agg is one of the operators in the following table, or a user-defined aggregate function.

DISTINCT, FILTER, and WITHIN GROUP behave as described for aggregate functions.

The supported operators are summarized in the following table:

Operator syntax	Description
COUNT(value [, value ]*) OVER window	Returns the number of rows in window for which value is not null (wholly not null if value is composite)
COUNT(*) OVER window	Returns the number of rows in window
AVG(numeric) OVER window	Returns the average (arithmetic mean) of numeric across all values in window
SUM(numeric) OVER window	Returns the sum of numeric across all values in window
MAX(value) OVER window	Returns the maximum value of value across all values in window
MIN(value) OVER window	Returns the minimum value of value across all values in window
RANK() OVER window	Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer
DENSE_RANK() OVER window	Returns the rank of the current row without gaps; this function counts peer groups
ROW_NUMBER() OVER window	Returns the number of the current row within its partition, counting from 1
FIRST_VALUE(value) OVER window	Returns value evaluated at the row that is the first row of the window frame
LAST_VALUE(value) OVER window	Returns value evaluated at the row that is the last row of the window frame
LEAD(value, offset, default) OVER window	Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL
LAG(value, offset, default) OVER window	Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL
NTH_VALUE(value, nth) OVER window	Returns value evaluated at the row that is the nth row of the window frame
NTILE(value) OVER window	Returns an integer ranging from 1 to value, dividing the partition as equally as possible

Operator syntax

Description

COUNT(value [, value ]*) OVER window

Returns the number of rows in window for which value is not null (wholly not null if value is composite)

COUNT(*) OVER window

Returns the number of rows in window

AVG(numeric) OVER window

Returns the average (arithmetic mean) of numeric across all values in window

SUM(numeric) OVER window

Returns the sum of numeric across all values in window

MAX(value) OVER window

Returns the maximum value of value across all values in window

MIN(value) OVER window

Returns the minimum value of value across all values in window

RANK() OVER window

Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer

DENSE_RANK() OVER window

Returns the rank of the current row without gaps; this function counts peer groups

ROW_NUMBER() OVER window

Returns the number of the current row within its partition, counting from 1

FIRST_VALUE(value) OVER window

Returns value evaluated at the row that is the first row of the window frame

LAST_VALUE(value) OVER window

Returns value evaluated at the row that is the last row of the window frame

LEAD(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

LAG(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

NTH_VALUE(value, nth) OVER window

Returns value evaluated at the row that is the nth row of the window frame

NTILE(value) OVER window

Returns an integer ranging from 1 to value, dividing the partition as equally as possible

Note:

You may specify null treatment (IGNORE NULLS, RESPECT NULLS) for FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD and LAG functions. The syntax handled by the parser, but only RESPECT NULLS is implemented at runtime.

13.13. Grouping Functions

LeanXcale supports the following grouping functions:

Operator syntax Description

Operator syntax	Description
GROUPING(expression [, expression ]*)	Returns a bit vector of the given grouping expressions
GROUP_ID()	Returns an integer that uniquely identifies the combination of grouping keys
GROUPING_ID(expression [, expression ]*)	Synonym for `GROUPING`

GROUPING(expression [, expression ]*)

Returns a bit vector of the given grouping expressions

GROUP_ID()

Returns an integer that uniquely identifies the combination of grouping keys

GROUPING_ID(expression [, expression ]*)

Synonym for GROUPING

13.14. Grouped Window Functions

Grouped window functions occur in the GROUP BY clause and define a key value that represents a window containing several rows.

13.14.1. CTUMBLE

In streaming queries, TUMBLE assigns a window for each row of a relation based on a timestamp column. An assigned window is specified by its beginning and ending. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”. CTUMBLE function works in a similar way as TUMBLE does, but for regular tables. Basically, it allows to group by a timestamp column indicating the granularity of the group by means of interval that specifies the temporal length of the window (e.g., INTERVAL '1' HOUR) and optionally it can indicate the start of such windows as a timestamp (e.g., '1970-01-01 01:01:00.000').

Operator syntax	Description
CTUMBLE(datetime COLUMN, interval [, time ])	Indicates a tumbling window of interval for datetime, optionally aligned at time

Operator syntax

Description

CTUMBLE(datetime COLUMN, interval [, time ])

Indicates a tumbling window of interval for datetime, optionally aligned at time

Here is an example. We have a table T with the following schema and contents:

SELECT CAST(CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000') AS TIMESTAMP) AS GTIMEST,
	keyid,
	SUM(metric1) AS SUMM1,
	SUM(metric2) AS SUMM2,
FROM TIMEMETRICS
WHERE keyid <= '1'
GROUP BY CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000'), keyid
ORDER BY 1,2

In the query above you will be getting the addition of the metrics for every key and for every hour (60 minute) interval. Since the initial time was at minute 01, all grouped values are at an hour with one minute.

The result obtained would look similar to:

GTIMEST	KEYID	SUMM1	SUMM2
2020-09-27 17:01:00	0	10	2.5
2020-09-27 17:01:00	1	10	2.5
2020-09-27 18:01:00	0	20	5.0
2020-09-27 18:01:00	1	20	5.0
2020-09-27 19:01:00	0	30	7.5
2020-09-27 19:01:00	1	30	7.5

13.14.2. Grouped Auxiliary Functions

Grouped auxiliary functions allow you to access properties of a window defined by a grouped window function:

Operator syntax Description

Operator syntax	Description
HOP_END(expression, slide, size [, time ])	Returns the value of expression at the end of the window defined by a `HOP` function call
HOP_START(expression, slide, size [, time ])	Returns the value of expression at the beginning of the window defined by a `HOP` function call
SESSION_END(expression, interval [, time])	Returns the value of expression at the end of the window defined by a `SESSION` function call
SESSION_START(expression, interval [, time])	Returns the value of expression at the beginning of the window defined by a `SESSION` function call
TUMBLE_END(expression, interval [, time ])	Returns the value of expression at the end of the window defined by a `TUMBLE` function call
TUMBLE_START(expression, interval [, time ])	Returns the value of expression at the beginning of the window defined by a `TUMBLE` function call

HOP_END(expression, slide, size [, time ])

Returns the value of expression at the end of the window defined by a HOP function call

HOP_START(expression, slide, size [, time ])

Returns the value of expression at the beginning of the window defined by a HOP function call

SESSION_END(expression, interval [, time])

Returns the value of expression at the end of the window defined by a SESSION function call

SESSION_START(expression, interval [, time])

Returns the value of expression at the beginning of the window defined by a SESSION function call

TUMBLE_END(expression, interval [, time ])

Returns the value of expression at the end of the window defined by a TUMBLE function call

TUMBLE_START(expression, interval [, time ])

Returns the value of expression at the beginning of the window defined by a TUMBLE function call

13.15. Time Series Functions: GAPFILL, INTERPOLATE, and LOCF

The GAPFILL operator can be used as a top-level function in a SELECT statement. It operates on timestamp expressions, aligned at a particular time unit, to fill missing values so that the result of the SELECT statement has no gaps with respect to the time unit. The values can then be filled with nulls, interpolated, or repeated. The input of the GAPFILL function is typically a time bucketing aggregate that aligns timestamps to a particular time unit, such as the functions FLOOR or CTUMBLE. Example:

create table abc(a integer primary key, b timestamp, c double);
insert into abc values(1, timestamp'2024-07-11 03:00:00', 10);
insert into abc values(2, timestamp'2024-07-11 03:00:03', 30);
select GAPFILL(ctumble(b, interval '1' second)), avg(c) as c
	from abc
		group by ctumble(b, interval '1' second);

Its signature is:

GAPFILL(ts timestamp [, start_ts timestamp, end_ts timestamp]) RETURNS timestamp

Note that the call to the GAPFILL function should be placed at the top level, i.e., in the SELECT list and not in the GROUP BY expression. This adds two values for the missing timestamps at 03:00:01 and 03:00:02, without filling the values at column c:

2024-07-11 03:00:00.0   10
2024-07-11 03:00:01.0
2024-07-11 03:00:02.0
2024-07-11 03:00:03.0   30

Alternatively, one can choose to fill the gaps with interpolated values or to repeat values by applying the INTERPOLATE or LOCF functions on the aggregated column:

select GAPFILL(ctumble(b, interval '1' second)),
	INTERPOLATE(avg(c))as c_interp, LOCF(avg(c)) as c_rep
from abc
group by ctumble(b, interval '1' second);

The output would be:

2024-07-11 03:00:00.0   10                             	10
2024-07-11 03:00:01.0   16.667                         	10
2024-07-11 03:00:02.0   23.333                         	10
2024-07-11 03:00:03.0   30                             	30

Their signatures are:

INTERPOLATE(a numeric) RETURNS numeric
LOCF(a numeric) RETURNS numeric

The CTUMBLE function can work with intervals that are arbitrary multiples of a millisecond time unit, i.e. milliseconds, seconds, minutes, hours, and days. The FLOOR function can also be used for time bucketing, only on single time units, but can be used with month and year time units. For example:

-- insert one more row for another month
insert into abc
	values(3, timestamp'2024-09-11 03:00:03', 30);

select GAPFILL(floor(b to month)), locf(avg(c))
	from abc
		group by floor(b to month);

Would yield as output:

  2024-07-01 00:00:00.0   20
  2024-08-01 00:00:00.0   20
  2024-09-01 00:00:00.0   30

And if the result should contain timestamps beyond the range defined by the underlying data, the range bounds can be specified as additional arguments to the gapfill function call:

select GAPFILL(floor(b to month), timestamp'2024-06-01 00:00:00',
timestamp'2024-10-01 00:00:00'), locf(avg(c))
	from abc
		group by floor(b to month);

Whose output is:

  2024-06-01 00:00:00.0
  2024-07-01 00:00:00.0   20
  2024-08-01 00:00:00.0   20
  2024-09-01 00:00:00.0   30
  2024-10-01 00:00:00.0   30

13.16. Approximate/Statistical Aggregate Functions

In this section we discuss a set of functions that are designed for estimating the cardinality of large datasets, frequency of distinct elements, percentiles, etc. with reduced memory usage and improved performance compared to exact methods.

13.16.1. HyperLogLog functions: HLL, DISTINCT_COUNT, and RELATIVE_ERROR

HyperLogLog (HLL) applies a probabilistic approach to approximate distinct values count using the HyperLogLog++ algorithm. For large datasets and datasets with high cardinality (many distinct values), this can be much more efficient in both CPU and memory than an exact count. HLL is an aggregate function, which returns an intermediate structure, that can be used to extract the distinct values count or be further involved in another HLL aggregation. The optional buckets parameter specifies the number of buckets for the HyperLogLog algorithm; it must be a value between 16 and 262144; if omitted, a default value of 4096 is applied.

Example:

-- approximates select count(distinct b) from abc; with error err
-- data contains 150000 distinct values; the result is approximately 150000
select distinct_count(hll(b)) as cnt,
	relative_error(hll(b)) as err
from abc;

-- aggregates on intermediate HLL structures
-- performs two-level aggregation on overlapping partitions of the data, each of 100000 distinct values
-- the result is still approximately 150000
select distinct_count(hll(h)) as cnt,
	relative_error(hll(h)) as err
from (
	select hll(b) as h from abc where a <= 100000
	union all
	select hll(b) as h from abc where a > 50000
);

The signatures of the functions are:

HLL(a any [, buckets integer]) RETURNS hll
DISTINCT_COUNT(h hll) RETURNS bigint
RELATIVE_ERROR(h hll) RETURNS double
HLL(h hll) RETURNS hll

13.16.2. Approximate Percentiles with UDDSketch: UDDS, APPROX_PERCENTILE, APPROX_PERCENTILE_RANK, etc.

UDDSketch estimates the value at a given percentile, or the percentile rank of a given value, using the UddSketch algorithm. This estimation is more memory- and CPU-efficient than an exact calculation. UDDS is an aggregate function, which returns an intermediate structure, that can be used to extract percentiles or ranks or be further involved in another UDDS aggregation. If optional parameters buckets and maxerr are omitted, default values of 200 buckets and 0.001 for the maximal error are applied.

Example:

-- approximates the value at the 75-th percentile and the rank (between 0 and 1) of value 13
select approx_percentile(udds(b), 0.75),
	approx_percentile_rank(udds(b), 13)
from abc;

-- aggregates on intermediate UDDS structures; returns the same as the above query
select approx_percentile(udds(u), 0.75),
	approx_percentile_rank(udds(u), 13)
from (
	select udds(b) as u from abc where a < 10
	union all
	select udds(b) as u from abc where a >= 10
);

The signatures are:

UDDS(a numeric [, buckets integer, maxerr double]) RETURNS udds
APPROX_PERCENTILE(u udds, p double) RETURNS double
APPROX_PERCENTILE_ARRAY(u udds, p double[]) RETURNS double[]
APPROX_PERCENTILE_RANK(u udds, v double) RETURNS double
UDDS_ERROR(u udds) RETURNS double
UDDS_MEAN(u udds) RETURNS double
UDDS_NVALS(u udds) RETURNS integer
UDDS(u udds) RETURNS udds

13.16.3. Statistical Aggregate Functions: STATS, STATS_MEAN, STATS_STDDEV, STATS_SKEWNESS, STATS_KURTOSIS, STATS_SUM, STATS_COUNT

STATS is an aggregate function, which performs common statistical analysis into an intermediate structure, that can be used to extract statistical metrics or be further involved in another STATS aggregation.

Example:

-- returns the stddev, kurtosis, and skewness of values in column b
select stats_stddev(stats(b)),
	stats_kurtosis(stats(b)),
	stats_skewness(stats(b))
from abc;

-- aggregates on intermediate STATS structures; returns the same as the above query
select stats_stddev(stats(s)),
	stats_kurtosis(stats(s)),
	stats_skewness(stats(s))
from (
	select stats(b) as s from abc where a < 10
	union all
	select stats(b) as s from abc where a >= 10
);

The signature of the functions are:

STATS(a numeric) RETURNS stats
STATS_MEAN(s stats) RETURNS double
STATS_STDDEV(s stats) RETURNS double
STATS_SKEWNESS(s stats) RETURNS double
STATS_KURTOSIS(s stats) RETURNS double
STATS_SUM(s stats) RETURNS double
STATS_COUNT(s stats) RETURNS bigint
STATS(s stats) RETURNS stats

13.17. JSON Functions

LeanXcale also supports a number of functions to operate over strings containing JSON values. We use the following convention in the following subsections:

jsonValue is a character string containing a JSON value;
path is a character string containing a JSON path expression; mode flag strict or lax should be specified in the beginning of path.

13.17.1. JSON Query Functions

These are the query function predicates over JSON values:

Operator syntax	Description
JSON_EXISTS(jsonValue, path [ { TRUE \| FALSE \| UNKNOWN \| ERROR } ON ERROR ] )	Whether a jsonValue satisfies a search criterion described using JSON path expression path
JSON_VALUE(jsonValue, path [ RETURNING type ] [ { ERROR \| NULL \| DEFAULT expr } ON EMPTY ] [ { ERROR \| NULL \| DEFAULT expr } ON ERROR ] )	Extract an SQL scalar from a jsonValue using JSON path expression path
JSON_QUERY(jsonValue, path [ { WITHOUT [ ARRAY ] \| WITH [ CONDITIONAL \| UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR \| NULL \| EMPTY ARRAY \| EMPTY OBJECT } ON EMPTY ] [ { ERROR \| NULL \| EMPTY ARRAY \| EMPTY OBJECT } ON ERROR ] )	Extract a JSON object or JSON array from jsonValue using the path JSON path expression
JSON_HAS_VALUE(jsonValue1, jsonValue2)	Whether jsonValue1 contains jsonValue2

Operator syntax

Description

JSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ] )

Whether a jsonValue satisfies a search criterion described using JSON path expression path

JSON_VALUE(jsonValue, path [ RETURNING type ] [ { ERROR | NULL | DEFAULT expr } ON EMPTY ] [ { ERROR | NULL | DEFAULT expr } ON ERROR ] )

Extract an SQL scalar from a jsonValue using JSON path expression path

JSON_QUERY(jsonValue, path [ { WITHOUT [ ARRAY ] | WITH [ CONDITIONAL | UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON ERROR ] )

Extract a JSON object or JSON array from jsonValue using the path JSON path expression

JSON_HAS_VALUE(jsonValue1, jsonValue2)

Whether jsonValue1 contains jsonValue2

Note:

The ON ERROR and ON EMPTY clauses define the fallback behavior of the function when an error is thrown or a null value is about to be returned.
The ARRAY WRAPPER clause defines how to represent a JSON array result in JSON_QUERY function. The following examples compare the wrapper behaviors.

Sample Data:

{"a": "[1,2]", "b": [1,2], "c": "hi"}

These are the JSON query functions: Comparison:

Operator	$.a	$.b	$.c
JSON_VALUE	[1, 2]	error	hi
JSON QUERY WITHOUT ARRAY WRAPPER	error	[1, 2]	error
JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER	[ “[1,2]” ]	[ [1,2] ]	[ “hi” ]
JSON QUERY WITH CONDITIONAL ARRAY WRAPPER	[ “[1,2]” ]	[1,2]	[ “hi” ]

Operator

$.a

$.b

$.c

JSON_VALUE

[1, 2]

error

JSON QUERY WITHOUT ARRAY WRAPPER

error

[1, 2]

error

JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[ [1,2] ]

[ “hi” ]

JSON QUERY WITH CONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[1,2]

[ “hi” ]

13.17.2. JSON Constructor Functions

These are the JSON value constructor functions:

Operator syntax	Description
JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] \| name : value [ FORMAT JSON ] } * [ { NULL \| ABSENT } ON NULL ] )	Construct JSON object using a series of key (name) value (value) pairs
JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] \| name : value [ FORMAT JSON ] } [ { NULL \| ABSENT } ON NULL ] )	Aggregate function to construct a JSON object using a key (name) value (value) pair
JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL \| ABSENT } ON NULL ] )	Construct a JSON array using a series of values (value)
JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL \| ABSENT } ON NULL ] )	Aggregate function to construct a JSON array using a value (value)

Operator syntax

Description

JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct JSON object using a series of key (name) value (value) pairs

JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON object using a key (name) value (value) pair

JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct a JSON array using a series of values (value)

JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON array using a value (value)

Note:

The flag FORMAT JSON indicates the value is formatted as JSON character string. When FORMAT JSON is used, the value should be de-parse from JSON character string to a SQL structured value.
ON NULL clause defines how the JSON output represents null values. The default null behavior of JSON_OBJECT and JSON_OBJECTAGG is NULL ON NULL, and for JSON_ARRAY and JSON_ARRAYAGG it is ABSENT ON NULL.
If ORDER BY clause is provided, JSON_ARRAYAGG sorts the input rows into the specified order before performing aggregation.

13.17.3. JSON Validity Operators

The following functions enable to check the validity of a JSON value:

Operator syntax	Description
jsonValue IS JSON [ VALUE ]	Whether jsonValue is a JSON value
jsonValue IS NOT JSON [ VALUE ]	Whether jsonValue is not a JSON value
jsonValue IS JSON SCALAR	Whether jsonValue is a JSON scalar value
jsonValue IS NOT JSON SCALAR	Whether jsonValue is not a JSON scalar value
jsonValue IS JSON OBJECT	Whether jsonValue is a JSON object
jsonValue IS NOT JSON OBJECT	Whether jsonValue is not a JSON object
jsonValue IS JSON ARRAY	Whether jsonValue is a JSON array
jsonValue IS NOT JSON ARRAY	Whether jsonValue is not a JSON array

Operator syntax

Description

jsonValue IS JSON [ VALUE ]

Whether jsonValue is a JSON value

jsonValue IS NOT JSON [ VALUE ]

Whether jsonValue is not a JSON value

jsonValue IS JSON SCALAR

Whether jsonValue is a JSON scalar value

jsonValue IS NOT JSON SCALAR

Whether jsonValue is not a JSON scalar value

jsonValue IS JSON OBJECT

Whether jsonValue is a JSON object

jsonValue IS NOT JSON OBJECT

Whether jsonValue is not a JSON object

jsonValue IS JSON ARRAY

Whether jsonValue is a JSON array

jsonValue IS NOT JSON ARRAY

Whether jsonValue is not a JSON array

13.17.4. JSON Auxiliary Functions

These are some JSON auxiliary functions:

Operator syntax	Description
JSON_TYPE(jsonValue)	Returns a string value indicating the type of a jsonValue
JSON_DEPTH(jsonValue)	Returns an integer value indicating the depth of a jsonValue
JSON_PRETTY(jsonValue)	Returns a pretty-printing of jsonValue
JSON_LENGTH(jsonValue [, path ])	Returns a integer indicating the length of jsonValue
JSON_KEYS(jsonValue [, path ])	Returns a string indicating the keys of a JSON jsonValue
JSON_REMOVE(jsonValue, path[, path])	Removes data from jsonValue using a series of path expressions and returns the result

Operator syntax

Description

JSON_TYPE(jsonValue)

Returns a string value indicating the type of a jsonValue

JSON_DEPTH(jsonValue)

Returns an integer value indicating the depth of a jsonValue

JSON_PRETTY(jsonValue)

Returns a pretty-printing of jsonValue

JSON_LENGTH(jsonValue [, path ])

Returns a integer indicating the length of jsonValue

JSON_KEYS(jsonValue [, path ])

Returns a string indicating the keys of a JSON jsonValue

JSON_REMOVE(jsonValue, path[, path])

Removes data from jsonValue using a series of path expressions and returns the result

Note:

JSON_TYPE / JSON_DEPTH / JSON_PRETTY return null if the argument is null
JSON_LENGTH / JSON_KEYS / JSON_REMOVE return null if the first argument is null
JSON_TYPE generally returns an upper-case string flag indicating the type of the JSON input. Currently supported supported type flags are:
- INTEGER
- STRING
- FLOAT
- DOUBLE
- LONG
- BOOLEAN
- DATE
- OBJECT
- ARRAY
- NULL
JSON_DEPTH defines a JSON value’s depth as follows:
- An empty array, empty object, or scalar value has depth 1;
- A non-empty array containing only elements of depth 1 or non-empty object containing only member values of depth 1 has depth 2;
- Otherwise, a JSON document has depth greater than 2.
JSON_LENGTH defines a JSON value’s length as follows:
- A scalar value has length 1;
- The length of array or object is the number of elements is contains.

Usage Examples:

13.17.5. JSON_TYPE Example

This statement:

SELECT JSON_TYPE(v) AS c1,
	JSON_TYPE(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
	JSON_TYPE(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
	JSON_TYPE(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

returns:

c1	c2	c3	c4
OBJECT	ARRAY	INTEGER	BOOLEAN

OBJECT

ARRAY

INTEGER

BOOLEAN

13.17.6. JSON_DEPTH Example

This statement returns:

SELECT JSON_DEPTH(v) AS c1,
	JSON_DEPTH(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
	JSON_DEPTH(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
	JSON_DEPTH(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

returns:

c1	c2	c3	c4
3	2	1	1

13.17.7. JSON_LENGTH Example

This statement:

SELECT JSON_LENGTH(v) AS c1,
	JSON_LENGTH(v, 'lax $.a') AS c2,
	JSON_LENGTH(v, 'strict $.a[0]') AS c3,
	JSON_LENGTH(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true]}')) AS t(v)
LIMIT 10;

returns:

c1	c2	c3	c4
1	2	1	1

13.17.8. JSON_KEYS Example

This statement:

SELECT JSON_KEYS(v) AS c1,
	JSON_KEYS(v, 'lax $.a') AS c2,
	JSON_KEYS(v, 'lax $.b') AS c2,
	JSON_KEYS(v, 'strict $.a[0]') AS c3,
	JSON_KEYS(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true],"b": {"c": 30}}')) AS t(v)
LIMIT 10;

returns:

c1	c2	c3	c4	c5
[“a”, “b”]	NULL	[“c”]	NULL	NULL

[“a”, “b”]

NULL

[“c”]

NULL

13.17.9. JSON_REMOVE Example

This statement:

SELECT JSON_REMOVE(v, '$[1]') AS c1
FROM (VALUES ('["a", ["b", "c"], "d"]')) AS t(v)
LIMIT 10;

returns:

c1
[“a”, “d”]

[“a”, “d”]

13.17.10. JSON_STORAGE_SIZE Example

This statement:

SELECT
	JSON_STORAGE_SIZE('[100, \"sakila\", [1, 3, 5], 425.05]') AS c1,
	JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"a\", \"c\": \"[1, 3, 5, 7]\"}') AS c2,
	JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"xyz\", \"c\": \"[1, 3, 5, 7]\"}') AS c3,
	JSON_STORAGE_SIZE('[100, \"json\", [[10, 20, 30], 3, 5], 425.05]') AS c4
LIMIT 10;

returns:

c1	c2	c3	c4
29	35	37	36

13.17.11. DECODE Example

This statement:

SELECT DECODE(f1, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c1,
	DECODE(f2, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c2,
	DECODE(f3, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c3,
	DECODE(f4, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c4,
	DECODE(f5, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c5
FROM (VALUES (1, 2, 3, 4, 5)) AS t(f1, f2, f3, f4, f5);

returns:

c1	c2	c3	c4	c5
aa	bb	cc	dd	ee

13.17.12. TRANSLATE example

This statement:

SELECT TRANSLATE('Aa*Bb*Cc''D*d', ' */''%', '_') as c1,
	TRANSLATE('Aa/Bb/Cc''D/d', ' */''%', '_') as c2,
	TRANSLATE('Aa Bb Cc''D d', ' */''%', '_') as c3,
	TRANSLATE('Aa%Bb%Cc''D%d', ' */''%', '_') as c4
FROM (VALUES (true)) AS t(f0);

returns:

c1	c2	c3	c4
Aa_Bb_CcD_d	Aa_Bb_CcD_d	Aa_Bb_CcD_d	Aa_Bb_CcD_d

Aa_Bb_CcD_d

14. User-Defined Functions

LeanXcale is extensible. One can define new functions using user code. There are several ways to define a function.

To implement a scalar function, there are 3 options:

Create a class with a public static eval method, and register the class;
Create a class with a public non-static eval method, and a public constructor with no arguments, and register the class;
Create a class with one or more public static methods, and register each class/method combination.

To implement an aggregate function, there are 2 options:

Create a class with public static init, add and result methods, and register the class;
Create a class with public non-static init, add and result methods, and a public constructor with no arguments, and register the class.

Optionally, add a public merge method to the class; this allows Calcite to generate code that merges sub-totals.

Optionally, make your class implement the SqlSplittableAggFunction interface; this allows Calcite to decompose the function across several stages of aggregation, roll up from summary tables, and push it through joins.

To implement a table function, there are 3 options:

Create a class with a static eval method that returns ScannableTable or QueryableTable, and register the class;
Create a class with a non-static eval method that returns ScannableTable or QueryableTable, and register the class;
Create a class with one or more public static methods that return ScannableTable or QueryableTable, and register each class/method combination.

To implement a table macro, there are 3 options:

Create a class with a static eval method that returns TranslatableTable, and register the class;
Create a class with a non-static eval method that returns TranslatableTable, and register the class;
Create a class with one or more public static methods that return TranslatableTable, and register each class/method combination.

LeanXcale infers the parameter types and result type of a function from the parameter and return types of the Java method that implements it. Further, you can specify the name and optionality of each parameter using the Parameter annotation.

14.1. Calling Functions with Named and Optional Parameters

Usually when you call a function, you need to specify all of its parameters, in order. But that can be a problem if a function has a lot of parameters, and especially if you want to add more parameters over time.

To solve this problem, the SQL standard allows you to pass parameters by name, and to define parameters which are optional (that is, have a default value that is used if they are not specified).

Suppose you have a function f, declared as in the following pseudo syntax:

FUNCTION f(
	INTEGER a,
	INTEGER b DEFAULT NULL,
	INTEGER c,
	INTEGER d DEFAULT NULL,
	INTEGER e DEFAULT NULL) RETURNS INTEGER

All of the function’s parameters have names, and parameters b, d and e have a default value of NULL and are therefore optional. In LeanXcale NULL is the only allowable default value for optional parameters.

When calling a function with optional parameters, you can omit optional arguments at the end of the list, or use the DEFAULT keyword for any optional arguments. Here are some examples:

f(1, 2, 3, 4, 5) provides a value to each parameter, in order;
f(1, 2, 3, 4) omits e, which gets its default value, NULL;
f(1, DEFAULT, 3) omits d and e, and specifies to use the default value of b;
f(1, DEFAULT, 3, DEFAULT, DEFAULT) has the same effect as the previous example;
f(1, 2) is not legal, because c is not optional;
f(1, 2, DEFAULT, 4) is not legal, because c is not optional.

You can specify arguments by name using the => syntax. If one argument is named, they all must be. Arguments may be in any other, but must not specify any argument more than once, and you need to provide a value for every parameter which is not optional. Here are some examples:

f(c => 3, d => 1, a => 0) is equivalent to f(0, NULL, 3, 1, NULL);
f(c => 3, d => 1) is not legal, because you have not specified a value for a and a is not optional.

15. Pattern Matching of CEP Events with MATCH_RECOGNIZE

MATCH_RECOGNIZE is a SQL extension for recognizing sequences of events in complex event processing (CEP).

The syntax is as follows:

matchRecognize=
	MATCH_RECOGNIZE '('
		[ PARTITION BY expression [, expression ]* ]
		[ ORDER BY orderItem [, orderItem ]* ]
		[ MEASURES measureColumn [, measureColumn ]* ]
		[ ONE ROW PER MATCH | ALL ROWS PER MATCH ]
		[ AFTER MATCH
			( SKIP TO NEXT ROW
			| SKIP PAST LAST ROW
			| SKIP TO FIRST variable
			| SKIP TO LAST variable
			| SKIP TO variable )
		]
		PATTERN '(' pattern ')'
			[ WITHIN intervalLiteral ]
			[ SUBSET subsetItem [, subsetItem ]* ]
		DEFINE variable AS condition [, variable AS condition ]*
	')'

subsetItem=
	variable = '(' variable [, variable ]* ')'

measureColumn=
	expression AS alias

pattern=
	patternTerm [ '|' patternTerm ]*

patternTerm=
	patternFactor [ patternFactor ]*

patternFactor=
	patternPrimary [ patternQuantifier ]

patternPrimary=
	variable
	|   '$'
	|   '^'
	|   '(' [ pattern ] ')'
	|   '{-' pattern '-}'
	|   PERMUTE '(' pattern [, pattern ]* ')'

patternQuantifier=
	'*'
	|   '*?'
	|   '+'
	|   '+?'
	|   '?'
	|   '??'
	|   '{' { [ minRepeat ], [ maxRepeat ] } '}' ['?']
	|   '{' repeat '}'

intervalLiteral=
	INTERVAL 'string' timeUnit [ TO timeUnit ]

In patternQuantifier, repeat is a positive integer, and minRepeat and maxRepeat are non-negative integers.

16. Copying Data Between Two Tables

Currently, the syntax CREATE TABLE … AS SELECT … is not supported. In order to copy data from one table to another, one first need to create the new table and then do INSERT/UPSERT … SELECT. This is an example:

CREATE TABLE TDEST (f1_key BIGINT, f2_value VARBINARY, PRIMARY KEY(f1_key));

INSERT INTO TDEST SELECT * FROM TORIG;

Additionally, the WITH TEMPORAL AS clause for using a temporal table can be used as in the following example:

CREATE TABLE Persons (
	ID int NOT NULL,
	NAME VARCHAR(20) NOT NULL,
	AGE int NOT NULL,
	SALARY double NOT NULL,
	PRIMARY KEY (ID)
);

INSERT INTO Persons (WITH TEMPORAL AS (SELECT 1, 'Juan', 10, cast(1550.1 as double)) SELECT * FROM temporal);

17. JDBC Function Escapes

JDBC function escapes are a mechanism provided by the JDBC API to invoke database-specific functions and procedures using standard SQL syntax within a Java application. JDBC function escapes are useful to use functions in Java code.

The syntax for JDBC function escapes involves using curly braces {} to enclose the call to the database-specific functions. The format typically looks like {fn functionName(argument1, argument2, …)}. The following sections provide the different functions for different data types.

17.1. Numeric

These are the Numeric JDBC function escapes:

Operator syntax	Description
{fn ABS(numeric)}	Returns the absolute value of numeric
{fn ACOS(numeric)}	Returns the arc cosine of numeric
{fn ASIN(numeric)}	Returns the arc sine of numeric
{fn ATAN(numeric)}	Returns the arc tangent of numeric
{fn ATAN2(numeric, numeric)}	Returns the arc tangent of the numeric coordinates
{fn CBRT(numeric)}	Returns the cube root of numeric
{fn CEILING(numeric)}	Rounds numeric up, and returns the smallest number that is greater than or equal to numeric
{fn COS(numeric)}	Returns the cosine of numeric
{fn COT(numeric)}	Returns the cotangent of numeric
{fn DEGREES(numeric)}	Converts numeric from radians to degrees
{fn EXP(numeric)}	Returns e raised to the power of numeric
{fn FLOOR(numeric)}	Rounds numeric down, and returns the largest number that is less than or equal to numeric
{fn LOG(numeric)}	Returns the natural logarithm (base e) of numeric
{fn LOG10(numeric)}	Returns the base-10 logarithm of numeric
{fn MOD(numeric1, numeric2)}	Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative
{fn PI()}	Returns a value that is closer than any other value to pi
{fn POWER(numeric1, numeric2)}	Returns numeric1 raised to the power of numeric2
{fn RADIANS(numeric)}	Converts numeric from degrees to radians
{fn RAND(numeric)}	Returns a random double using numeric as the seed value
{fn ROUND(numeric1, numeric2)}	Rounds numeric1 to numeric2 places right to the decimal point
{fn SIGN(numeric)}	Returns the signum of numeric
{fn SIN(numeric)}	Returns the sine of numeric
{fn SQRT(numeric)}	Returns the square root of numeric
{fn TAN(numeric)}	Returns the tangent of numeric
{fn TRUNCATE(numeric1, numeric2)}	Truncates numeric1 to numeric2 places right to the decimal point

Operator syntax

Description

{fn ABS(numeric)}

Returns the absolute value of numeric

{fn ACOS(numeric)}

Returns the arc cosine of numeric

{fn ASIN(numeric)}

Returns the arc sine of numeric

{fn ATAN(numeric)}

Returns the arc tangent of numeric

{fn ATAN2(numeric, numeric)}

Returns the arc tangent of the numeric coordinates

{fn CBRT(numeric)}

Returns the cube root of numeric

{fn CEILING(numeric)}

Rounds numeric up, and returns the smallest number that is greater than or equal to numeric

{fn COS(numeric)}

Returns the cosine of numeric

{fn COT(numeric)}

Returns the cotangent of numeric

{fn DEGREES(numeric)}

Converts numeric from radians to degrees

{fn EXP(numeric)}

Returns e raised to the power of numeric

{fn FLOOR(numeric)}

Rounds numeric down, and returns the largest number that is less than or equal to numeric

{fn LOG(numeric)}

Returns the natural logarithm (base e) of numeric

{fn LOG10(numeric)}

Returns the base-10 logarithm of numeric

{fn MOD(numeric1, numeric2)}

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

{fn PI()}

Returns a value that is closer than any other value to pi

{fn POWER(numeric1, numeric2)}

Returns numeric1 raised to the power of numeric2

{fn RADIANS(numeric)}

Converts numeric from degrees to radians

{fn RAND(numeric)}

Returns a random double using numeric as the seed value

{fn ROUND(numeric1, numeric2)}

Rounds numeric1 to numeric2 places right to the decimal point

{fn SIGN(numeric)}

Returns the signum of numeric

{fn SIN(numeric)}

Returns the sine of numeric

{fn SQRT(numeric)}

Returns the square root of numeric

{fn TAN(numeric)}

Returns the tangent of numeric

{fn TRUNCATE(numeric1, numeric2)}

Truncates numeric1 to numeric2 places right to the decimal point

17.2. String

These are the String JDBC function escapes:

Operator syntax	Description
{fn ASCII(string)}	Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty
{fn CONCAT(character, character)}	Returns the concatenation of character strings
{fn INSERT(string1, start, length, string2)}	Inserts string2 into a slot in string1
{fn LCASE(string)}	Returns a string in which all alphabetic characters in string have been converted to lower case
{fn LENGTH(string)}	Returns the number of characters in a string
{fn LOCATE(string1, string2 [, integer])}	Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified.
{fn LEFT(string, length)}	Returns the leftmost length characters from string
{fn LTRIM(string)}	Returns string with leading space characters removed
{fn REPLACE(string, search, replacement)}	Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed
{fn REVERSE(string)}	Returns string with the order of the characters reversed
{fn RIGHT(string, integer)}	Returns the rightmost length characters from string
{fn RTRIM(string)}	Returns string with trailing space characters removed
{fn SUBSTRING(string, offset, length)}	Returns a character string that consists of length characters from string starting at the offset position
{fn UCASE(string)}	Returns a string in which all alphabetic characters in string have been converted to upper case

Operator syntax

Description

{fn ASCII(string)}

Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty

{fn CONCAT(character, character)}

Returns the concatenation of character strings

{fn INSERT(string1, start, length, string2)}

Inserts string2 into a slot in string1

{fn LCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to lower case

{fn LENGTH(string)}

Returns the number of characters in a string

{fn LOCATE(string1, string2 [, integer])}

Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified.

{fn LEFT(string, length)}

Returns the leftmost length characters from string

{fn LTRIM(string)}

Returns string with leading space characters removed

{fn REPLACE(string, search, replacement)}

Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed

{fn REVERSE(string)}

Returns string with the order of the characters reversed

{fn RIGHT(string, integer)}

Returns the rightmost length characters from string

{fn RTRIM(string)}

Returns string with trailing space characters removed

{fn SUBSTRING(string, offset, length)}

Returns a character string that consists of length characters from string starting at the offset position

{fn UCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to upper case

17.3. Date/time

These are the DATE and TIME JDBC function escapes:

Operator syntax Description

Operator syntax	Description
{fn CURDATE()}	Equivalent to `CURRENT_DATE`
{fn CURTIME()}	Equivalent to `LOCALTIME`
{fn NOW()}	Equivalent to `LOCALTIMESTAMP`
{fn YEAR(date)}	Equivalent to `EXTRACT(YEAR FROM date)`. Returns an integer.
{fn QUARTER(date)}	Equivalent to `EXTRACT(QUARTER FROM date)`. Returns an integer between 1 and 4.
{fn MONTH(date)}	Equivalent to `EXTRACT(MONTH FROM date)`. Returns an integer between 1 and 12.
{fn WEEK(date)}	Equivalent to `EXTRACT(WEEK FROM date)`. Returns an integer between 1 and 53.
{fn DAYOFYEAR(date)}	Equivalent to `EXTRACT(DOY FROM date)`. Returns an integer between 1 and 366.
{fn DAYOFMONTH(date)}	Equivalent to `EXTRACT(DAY FROM date)`. Returns an integer between 1 and 31.
{fn DAYOFWEEK(date)}	Equivalent to `EXTRACT(DOW FROM date)`. Returns an integer between 1 and 7.
{fn HOUR(date)}	Equivalent to `EXTRACT(HOUR FROM date)`. Returns an integer between 0 and 23.
{fn MINUTE(date)}	Equivalent to `EXTRACT(MINUTE FROM date)`. Returns an integer between 0 and 59.
{fn SECOND(date)}	Equivalent to `EXTRACT(SECOND FROM date)`. Returns an integer between 0 and 59.
{fn TIMESTAMPADD(timeUnit, count, datetime)}	Adds an interval of count timeUnits to a datetime
{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)}	Subtracts timestamp1 from timestamp2 and returns the result in timeUnits

{fn CURDATE()}

Equivalent to CURRENT_DATE

{fn CURTIME()}

Equivalent to LOCALTIME

{fn NOW()}

Equivalent to LOCALTIMESTAMP

{fn YEAR(date)}

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

{fn QUARTER(date)}

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

{fn MONTH(date)}

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

{fn WEEK(date)}

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

{fn DAYOFYEAR(date)}

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

{fn DAYOFMONTH(date)}

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

{fn DAYOFWEEK(date)}

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

{fn HOUR(date)}

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

{fn MINUTE(date)}

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

{fn SECOND(date)}

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

{fn TIMESTAMPADD(timeUnit, count, datetime)}

Adds an interval of count timeUnits to a datetime

{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)}

Subtracts timestamp1 from timestamp2 and returns the result in timeUnits

17.4. System

These are the system JDBC function escapes:

Operator syntax Description

Operator syntax	Description
{fn DATABASE()}	Equivalent to `CURRENT_CATALOG`
{fn IFNULL(value1, value2)}	Returns value2 if value1 is null
{fn USER()}	Equivalent to `CURRENT_USER`

{fn DATABASE()}

Equivalent to CURRENT_CATALOG

{fn IFNULL(value1, value2)}

Returns value2 if value1 is null

{fn USER()}

Equivalent to CURRENT_USER

17.4.1. Conversion

These are the conversion JDBC function escapes:

Operator syntax	Description
{fn CONVERT(value, type)}	Cast value into type

Operator syntax

Description

{fn CONVERT(value, type)}

Cast value into type

DML Triggers

A trigger is procedural code that is automatically executed in response to certain events on a particular table. DML triggers run when a user tries to modify data through a data manipulation language (DML) event. They can be used to enforce business rules and data integrity, query other tables, and include complex statements. DML events are INSERT, UPDATE, or DELETE statements on a table.

CREATE TRIGGER myTrigger BEFORE INSERT UPDATE ON t
FOR EACH ROW EXECUTE genericBackupTrigger('t_backup,ts')
PRIORITY 10;

DISABLE TRIGGER myTrigger ON t;

ENABLE TRIGGER myTrigger ON t;

DROP TRIGGER myTrigger ON TABLE t;

1. CREATE TRIGGER Statement

The syntax to create a trigger is as follow:

createTrigger:
      CREATE TRIGGER [ IF NOT EXISTS ] triggerName
      (BEFORE | AFTER)
      [INSERT] [UPDATE] [DELETE]
      ON tableName FOR EACH ROW EXECUTE
      triggerFunction ['(' arg ')']
      [PRIORITY priority]

where

triggerName is the name used by the user to identify the trigger.
tableName is the table where the trigger applies
triggerFunction is the Trigger Function name as the result of getTriggerName(). See Trigger Function interface
arg is the trigger function argument as string literal passed in triggerExecutor(). Check [triggerExecutor params]
priority is an integer literal used to order teh trigger execution in case that the table have more than one. Note that the resulting new values from one trigger will be passed to the execution of the next one.

Note that you can define 1 to 3 dml events among INSERT, UPDATE and DELETE.

2. Trigger Functions System Table

To find out the loaded trigger functions, you can query TRIGGER_FUNCTIONS system table.

> select * from LXSYSMETA.TRIGGER_FUNCTIONS;
+---------------------------+-------------------------------------------------------------------------+
|        triggerName        |                               description                               |
+---------------------------+-------------------------------------------------------------------------+
| HashFieldTrigger          | Hash Field Trigger                                                      |
| UUIDTrigger               | UUID Trigger                                                            |
| AutoGeohashTrigger        | Auto Geohash Trigger                                                    |
| ForceAutoIncrementTrigger | Force Auto Increment Trigger                                            |
| AutoIncrementTrigger      | Auto Increment Trigger                                                  |
| DefaultValueTrigger       | Default Value Trigger                                                   |
| DeleteForeignKeyTrigger   | Check for Referential Integrity when deleting row from referenced table |
| ForeignKeyTrigger         | Check for Referential Integrity                                         |
+---------------------------+-------------------------------------------------------------------------+

3. Custom Trigger Functions

Trigger functions are java objects that are automatically loaded into the Query Engine and provide a mechanism to build the Trigger’s executors, that holds the actual trigger procedural code. In order to create a custom trigger function, you need to implement from TriggerFunctionI Interface and define the following methods:

ResultType	Function	Description
String	getTriggerName()	Define the trigger function name
String	description()	Define a description for the trigger function so users can figure out what it does
Object	buildArguments(String arg)	Build an instance of argumntes from freestyle string. The instance is passed to triggerExecutor method.
TriggerExecutor	triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object args)	Creates an istance of TriggerExecutor. See Trigger Executor Method

ResultType

Function

Description

String

getTriggerName()

Define the trigger function name

String

description()

Define a description for the trigger function so users can figure out what it does

Object

buildArguments(String arg)

Build an instance of argumntes from freestyle string. The instance is passed to triggerExecutor method.

TriggerExecutor

triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object args)

Creates an istance of TriggerExecutor. See Trigger Executor Method

public class GenericBackupTriggerExample implements TriggerFunctionI {
  static String triggerName = "genericBackupTrigger";

  @Override
  public String getTriggerName() {
    return triggerName;
  }

  @Override
  public Object buildArguments(String arg) {
    return arg.split(",");
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    String[] args = (String[]) arg;
    return new GenericBackupTriggerExecutor(triggerUtils, newValues, oldValues, args[0], args[1]);
  }

  @Override
  public String description() {
    return "Generic backup Trigger example";
  }
}

3.1. Method triggerExecutor

The method triggerExecutor builds an instance of TriggerExecutor accordingly to the given params.

Param position	Param type	Description
1	TriggerUtils	Utils. See Trigger Utils
2	TypedNamedValuesI	New values for table’s row. Null on deletion
3	TypedNamedValuesI	Old values for table’s row. Null on insertion
4	String	Trigger argument

Param position

Param type

Description

TriggerUtils

Utils. See Trigger Utils

TypedNamedValuesI

New values for table’s row. Null on deletion

TypedNamedValuesI

Old values for table’s row. Null on insertion

String

Trigger argument

3.2. Custom triggerExecutor

Trigger’s executors are the ones that actually holds the procedural code to be executed when a trigger is fired. You would just need to extend from TriggerExecutor abstract class and define the following methods:

ResultType	Function	Description
void	execute()	Executes the procedural code

ResultType

Function

Description

void

execute()

Executes the procedural code

In order to implement the trigger execution code, you might probably need to use the following final methods:

ResultType	Function	Description
boolean	hasValues()	Tell if the new values are available
boolean	hasOldValues()	Tell if the old values are available
ExpressionType	getType(String column)	Get the column’s type
Object	getValue(String column)	Get new value for column
Object	getOldValue(String column)	Get old value for column
void	setValue(String column, Object value)	Set new value for column
String[]	valuesFieldNames()	Get values field names

ResultType

Function

Description

boolean

hasValues()

Tell if the new values are available

boolean

hasOldValues()

Tell if the old values are available

ExpressionType

getType(String column)

Get the column’s type

Object

getValue(String column)

Get new value for column

Object

getOldValue(String column)

Get old value for column

void

setValue(String column, Object value)

Set new value for column

String[]

valuesFieldNames()

Get values field names

Note that TriggerExecutor abstract class constructor has the following parameters:

Param position	Param type	Description
1	TypedNamedValuesI	New values for table’s row
2	TypedNamedValuesI	Old values for table’s row

Param position

Param type

Description

TypedNamedValuesI

New values for table’s row

TypedNamedValuesI

Old values for table’s row

  static public class GenericBackupTriggerExecutor extends TriggerExecutor {

    public final String backupTableName;
    public final String tsColumnName;
    public final TriggerUtils triggerUtils;

    public GenericBackupTriggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, String backupTableName, String tsColumnName) {
      super(newValues, oldValues);
      this.backupTableName = backupTableName;
      this.tsColumnName = tsColumnName;
      this.triggerUtils = triggerUtils;
    }

    @Override
    public String getExecutorName() {
      return triggerName;
    }

    @Override
    public void execute() {
      if (!hasValues()) { // delete
        return;
      }
      try (Session session = triggerUtils.session()) {
        Table table = session.database().getTable(backupTableName);
        Tuple tuple = table.createTuple();
        for (String column : valuesFieldNames()) {
          tuple.put(column, getValue(column));
        }
        tuple.put(tsColumnName, Timestamp.valueOf(LocalDateTime.now()));
        table.insert(tuple);
        // no need to commit when sharing QE's transaction
      } catch (Exception exception) {
        throw new LeanxcaleRuntimeException(exception);
      }
    }
  }

4. Trigger Utils

An instance of trigger utils provides connection information and some other functionalities that could be useful in order to implemet a trigger like creating a no-SQL direct session, get the next or current value of a sequence, generate a uuid, etc. Find below the available methods:

ResultType	Function	Description
long	localTimestamp()	Returns the current date and time in the session time zone
long	currentTimestamp()	Returns the current date and time in the session adjusted time zone
Long	getTid()	Return current transaction
String	getDbName()	Return connection’s database
String	getSchemaName()	Return connection’s schema
String	uuid()	Generate a UUID
Connection	getConnection()	Get current JDBC connection
Session	session(Settings Settings)	Creates a new session to a Kivi database. It shares the current kivi connection.
Session	session()	Creates a new session to a Kivi database. It shares the current kivi connection and transaction to connect through the direct API.
long	currVal(String sequence)	Get current sequence value or null if it is not set
long	nextVal(String sequence)	Get next sequence value and set current sequence value
String	geohash(List<Object> geohashSourceValues)	Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

ResultType

Function

Description

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID

Connection

getConnection()

Get current JDBC connection

Session

session(Settings Settings)

Creates a new session to a Kivi database. It shares the current kivi connection.

Session

session()

Creates a new session to a Kivi database. It shares the current kivi connection and transaction to connect through the direct API.

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

As you might realize, TableFunctionUtils is an extended implementation of TriggerUtils, see Table Function Utils. You could use TableFunctionUtils within trigger’s code but we strongly recommend to use the given TriggerUtils instance so the trigger can be fired by no-SQL API events too.

5. Expose a Trigger as a Table Function

For developments environments, it could be interesting to execute a table funtion in order to test a trigger behavior. In order to do this, we provide an abstract class TriggerTableFunction that will save you from do it yourself. As you can see in the following example, you would just need to add the definitions of the table’s function methods getFunctionName() and eval(..).

public class BackupTriggerExample extends TriggerTableFunction {

  static String triggerName = "backupTrigger";
  static String[] columnNames = {"PK1", "PK2", "F3", "F4"};

  public static String getFunctionName() {
    return triggerName;
  }

  @Override
  public String getTriggerName() {
    return getFunctionName();
  }

  @Override
  public Object buildArguments(String s) {
    return s;
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    return new BackupTriggerExecutor(newValues, oldValues);
  }

  @Override
  public String description() {
    return "Trigger example";
  }

  public ScannableTable eval(Long pk1, String pk2, String f3, Integer f4) {
    Object[] values = {pk1, pk2, f3, f4};
    int[] types = {Types.BIGINT, Types.VARCHAR, Types.VARCHAR, Types.INTEGER};
    NamedValuesI newValues = new TypedNamedValuesImpl(columnNames, values, types);
    return evalTrigger(newValues, null, buildArguments(null));
  }
}

Don’t forget to return the evalTrigger(NamedValuesI newValues, NamedValuesI oldValues, String args) result so the trigger is executed and its result is returned by the table function. Note that evalTrigger params will be mapped to the triggerExecutor ones, and it will invoke buildArguemnts.

 try (ResultSet rs = s.executeQuery("exec backupTrigger(1, 'a', 'af3', null)")) {
        assertThat(rs.next(), is(true));
        assertThat(rs.getLong(1), is(1L));
        assertThat(rs.getString(2), is("a"));
        assertThat(rs.getString(3), is("af3"));
        assertThat(rs.getInt(4), is(100));
        assertThat(rs.next(), is(false));
      }

5.1. Typed Named Values

The TypedNamedValuesI interface provides a interface to access a value by name. This is the following:

ResultType

Function

Description

ExpressionType

getType(String name)

Get the column’s type

Object

getValue(String name)

Get the value for name. Null when not found or value is null

void

setValue(String name, Object value)

Set the value for name

String[]

getFieldNames()

Get field names

Object[]

getValues()

Get values ordered as getFieldNames()

boolean

contains(String name)

Tells if name exists

int

size()

Tells size of the values

String

sqlTableName()

Table’s name the values belong to. Null when unknown

Anyway, we provide a basic TypedNamedValuesI implementation that might satisfy your needs.

  /**
   * Create a TypedNamesValues
   * @param positionsMap Map value position by name
   * @param row values
   * @param typeMap Map SQL type by name
   */
  public TypedNamedValuesImpl(Map<String, Integer> positionsMap, Object[] row, Map<String, ExpressionType> typeMap) {
    ...
  }

  /**
   * Create a NamedValues object
   * @param names values names
   * @param row values ordered by names
   * @param types  SQL types ordered by names
   */
  public TypedNamedValuesImpl(String[] names, Object[] row, int[] types) {
    ...
  }

5.2. Expression Type

The ExpressionType interface provides information about a table’s column or an expression. This is the following:

ResultType

Function

Description

int

sqlType()

Expression’s SQL JDBC type

int

precision()

Expression’s SQL JDBC type precision

int

scale()

Expression’s SQL JDBC type scale

boolean

isNullable()

Tells if the expression is nullable

6. System Triggers

Despite of custom triggers, we provide some system triggers that could be used directly by the user.

6.1. Auto Increment Trigger

This trigger is created automatically for a table when using the IDENTITY clause on table’s creation or alteration. Its execution fills the desired column with sequence’s next val when it is null. Its argument is a list of strings separated by coma. Find below their meaning:

Argument position

Description

Name of the column to be assigned with sequence’s next val when null

Sequence name. We suggest to include the schema name so the trigger won’t fail when connected to a different schema

The following table’s creation is equivalent to the one below:

create table tauto (i int AS IDENTITY start with 10 increment by 10, v VARCHAR, PRIMARY KEY (i));

create table tauto (i int, v VARCHAR, PRIMARY KEY (i));

create sequence tauto_i_seq start with 10 increment by 10;

CREATE TRIGGER tauto_i_seq_trigger BEFORE INSERT ON tauto
FOR EACH ROW EXECUTE autoIncrementTrigger('i,tauto_i_seq')
PRIORITY 100;

-- Note that the autoincrement colums will become an implicit column for inserts
insert into tauto values ('value');

6.2. Geohash Trigger

This trigger is created automatically for a table when using the PRIMARY GEOHASH KEY or GEOHASH clauses on table’s creation. Hidden geohash fields are created too when using these clauses. The trigger execution fills the desired column with the calculated geohash when it is null. Its argument is a list of strings separated by coma. Find below their meaning:

Argument position

Description

Name of the column to be assigned with the calculated geohash

WKT field if there is only two arguments, or Latitude field if there is three

Longitude field if there is three arguemnts

The following table’s creation is equivalent to the one below:

create table t (wkt VARCHAR, latitude double, longitude double, PRIMARY GEOHASH KEY (wkt), GEOHASH (latitude,longitude));

create table t (wkt VARCHAR, latitude double, longitude double, gh_wkt VARCHAR PRIMARY KEY, gh_lat_long VARCHAR);

CREATE TRIGGER gh_wkt_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_wkt,wkt')
PRIORITY 100;

CREATE TRIGGER gh_lat_long_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_lat_long,latitude,longitude')
PRIORITY 100;

-- Note that GEOHASH clause will create and index too, so we create it here manually
CREATE INDEX gh_lat_long_index ON t (gh_lat_long);

-- Note that the geohash column will become an implicit column for inserts and selects
insert into t values (ST_asText(ST_MakePoint(3.41,40.25)),cast(40.25 as double), cast(3.41 as double));

Geohash fields are fields indexed geographically for GIS applications

6.3. UUID Trigger

This trigger is created automatically for a table when not defining a PRIMARY KEY. A hidden UUID column is created too. The trigger execution fills the desired column with an UUID valuen when it is null. Its argument is Name of the column to be assigned with uuid.

The following table’s creation is equivalent to the one below:

create table tauto (v VARCHAR);

create table t (id VARCHAR, v VARCHAR, PRIMARY KEY (id));

CREATE TRIGGER uuid_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE UUIDTrigger('id')
PRIORITY 100;

-- Note that the uuid column will become a implicit column for inserts and selects
insert into t values ('value');

6.4. Referential Integrity Triggers

Referential integrity is implemented as well using triggers. Every time a Foreign key constraint is added to a table, two triggers are added to the table and one to the referenced table. They are disabled by default. You should enable them to activate referential checks.

> create table tpk (i int, v VARCHAR, i2 int, PRIMARY KEY (i));

> create table tfk (pk int, v VARCHAR, fk int, PRIMARY KEY (pk), CONSTRAINT fKi2 FOREIGN KEY (fk) REFERENCES tpk(i));

> ENABLE TRIGGER fKi2 ON tfk;

> ENABLE TRIGGER fKi2 ON tpk;

> select * from LXSYSMETA.TRIGGERS;
triggerName      FKI2
triggerFunction  DeleteForeignKeyTrigger
args             {"fkTableName":"APP.TFK","columns":["I"],"fkColumns":["FK"]}
isBefore         true
isInsert         false
isUpdate         false
isDelete         true
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TPK

triggerName      FKI2_NO_CONFLICT
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":false,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         false
tableCat         db
tableSchem       APP
tableName        TFK

triggerName      FKI2
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":true,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TFK

The two triggers added to the table use the same trigger function ForeignKeyTrigger but with different params. One will only check that the referenced key exist and the other will create a conflict on it too. Both will be triggered on insertion. You would only need enable the one you need.

The trigger on the referenced table DeleteForeignKeyTrigger would prevent from deleting a row whose primary key is referenced by the table that holds the foreing key. Note that an index with same constraint name is created along with the foreign key’s triggers to improve DeleteForeignKeyTrigger performance.

Table Functions

A table function refers to a user-defined function that returns a table or a set of rows as its result. Table functions are used to encapsulate logic and calculations, allowing you to treat the result of the function as if it were a table in your SQL queries.

1. How to Develop and Deploy a Table Function

First you need to add the LeanXcale maven repository and the qe-tableFunctions dependency to your pom.xml file:

<repositories>
    <repository>
        <id>maven-releases</id>
        <url>https://nexus.leanxcale.com/repository/maven-releases</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>com.leanxcale</groupId>
        <artifactId>qe-tableFunctions</artifactId>
        <version>1.7.6</version>
    </dependency>
</dependencies>

In order to create a custom table function, a Java class has to be written implementing the AutoloadTableFunction interface and defining the following methods:

Function

Description

Return

getFunctionName()

Define the custom table function name

String

getDescription()

Provides the custom table function description

String

eval(<table function arguments>)

Generate the collection of rows to be queried

ScannableTable

You can define as many eval functions as you need as long as their signatures are different. In order to write the eval functions returning ScannableTable, the following methods should be implemented:

getRowType
- RelDataType getRowType(RelDataTypeFactory typeFactory)
- Description:
  - Returns this table’s row type. This is a struct type whose fields describe the names and types of the columns in this table. The implementer must use the type factory provided. This ensures that the type is converted into a canonical form; other equal types in the same query will use the same object.
- Parameters:
  - typeFactory: Type factory with which to create the type.
- Returns:
  - Row type
getStatistic
- Statistic getStatistic()
- Description:
  - Statistic getStatistic()
- Returns:
  - Statistic
getJdbcTableType
- Description:
  - Returns the type of table.
- Returns:
  - Table type.
sRolledUp
- Description:
  - Determines whether the given column has been rolled up.
- Returns:
  - true if the column is rolled up.
rolledUpColumnValidInsideAgg
- Description:
  - Determines whether the given rolled up column can be used inside the given aggregate function. You can assume that isRolledUp(column) is true.
- Parameters:
  - column - The column name for which isRolledUp is true.
  - call - The aggregate call.
  - parent - Parent node of call in the SqlNode tree.
  - config - Config settings. May be null.
- Returns:
  - true if the given aggregate call is valid.

All the table function classes must belong to the package com.leanxcale in order to have them auto-loaded when LeanXcale starts.

Find below a complete example:

package com.leanxcale;

import com.leanxcale.QE.tableFunction.AutoloadTableFunction;
import org.apache.calcite.DataContext;
import org.apache.calcite.config.CalciteConnectionConfig;
import org.apache.calcite.linq4j.Enumerable;
import org.apache.calcite.linq4j.Linq4j;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.schema.ScannableTable;
import org.apache.calcite.schema.Schema;
import org.apache.calcite.schema.Statistic;
import org.apache.calcite.schema.Statistics;
import org.apache.calcite.sql.SqlCall;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.type.SqlTypeName;
import org.apache.calcite.util.ImmutableBitSet;
import java.util.Collections;

public class SeriesGeneratorTableFunction implements AutoloadTableFunction {

  public static String getFunctionName() {
    return "series_generator";
  }

  public ScannableTable eval(Integer start, Integer stop, Integer step) {
    int actualStep = (null == step) ? 1 : step;
    int numResults = (1 + (stop - start)/actualStep);
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        Object[][] results = new Object[numResults][1];
        for (int inx = 0; inx < numResults; inx++) {
          results[inx][0] = start + inx*actualStep;
        }
        return Linq4j.asEnumerable(results);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        return typeFactory.builder()
                .add("SERIES", SqlTypeName.INTEGER)
                .build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(numResults, Collections.singletonList(ImmutableBitSet.of(numResults)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

Once you do mvn package, a .jar file gets created with the custom table functions you defined. This .jar should be included in the Query Engine’s classpath (lxs/LX-BIN/lib). The Query Engine should be restarted after adding the table function .jar file. For the cloud version, the leanxcale console will provide the means to add the table function code to your database.

2. Using Table Functions

You can use a table function in the FROM clause of a query as the first argument to the table function. In this example, series_generator is the table function:

SELECT SERIES FROM table(series_generator(1, 10, 1)) WHERE SERIES > 3;

+--------+
| SERIES |
+--------+
| 4      |
| 5      |
| 6      |
| 7      |
| 8      |
| 9      |
| 10     |
+--------+
7 rows selected (0.008 seconds)

Take into account that if the parameter you’re passing to your table function is a string, you’ll need to use single quotes. Double quotes are not allowed. This is an example:

SELECT * FROM table(my_new_tablefunction('test_string'));

But this does not work:

SELECT * FROM table(my_new_tablefunction("test_string"));

3. Table Function Utils

In order to ease the writing of table function, you can use the TableFunctionUtils class. You will be able to execute SQL queries within your Table Function.

ResultType

Function

Description

TableFunctionUtils

utils()

Static function to build an instance for the table function utils according to the current connection

Connection

getConnection()

Provide an connection to perfom SQL actions. This connection cannot be committed or rollbacked

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID ////

Session

session(Settings Settings)

Creates a new session to the KiVi database. It shares the current KiVi connection.

Session

session()

Creates a new session to the KiVi database. It shares the current KiVi connection and transaction to connect through the direct API. ////

Table

getTable(String tableName)

Provide a table object that exist within current transaction context so you can perform basic operations on it

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

Find below an example:

public class TableFunctionUtilsExample implements AutoloadTableFunction {
  public static final String functionName = "TableFunctionUtilsExample";

  public static String getFunctionName() {
    return functionName;
  }

  public static String getDescription() {
    return functionName;
  }

  public ScannableTable eval(Object parentTableName, Object sonTableName) {
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        List<Object[]> result = new ArrayList<>();

        TableFunctionUtils utils = TableFunctionUtils.utils();
        Settings settings = new Settings();
        settings.setDatabase(utils.getDbName());
        settings.setSchema(utils.getSchemaName());
        settings.disableConflictChecking();
        settings.disableLogging();
        try (Session session = utils.session(settings)) {
          Table table = session.database().getTable((String) parentTableName);
          try (ResultSet rs = utils.getConnection().createStatement().executeQuery("select i_name, max_price from " + sonTableName)) {
            while (rs.next()) {
              Tuple tuple = table.find()
                  .filter(Filters.eq("i_name", rs.getString(1)).and(Filters.eq("i_price", rs.getInt(2))))
                  .iterator().next();
              Object[] row = new Object[3];
              row[0] = tuple.get("i_id");
              row[1] = tuple.get("i_name");
              row[2] = tuple.get("i_price");
              result.add(row);
            }
          }
        } catch (Exception exception) {
          throw new LeanxcaleRuntimeException(exception);
        }

        return Linq4j.asEnumerable(result);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        RelDataTypeFactory.Builder builder = typeFactory.builder();
        builder.add("ID", SqlTypeName.INTEGER);
        builder.add("NAME", SqlTypeName.VARCHAR);
        builder.add("PRICE", SqlTypeName.INTEGER);
        return builder.build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(1, Collections.singletonList(ImmutableBitSet.of(1)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

GIS/Spatial Support

LeanXcale’s GIS support and LeanXcale’s high performance over GIS data is based a search algorithm based on Geohashingh. Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer and G.M. Morton that encodes a geographic location into a short string of letters and digits. Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size with a gradual lose of precision. LeanXcale leverages this gradual precision to perform ST geometric and geographic operations over GIS data following the Geohash property that lies in nearby places will often present similar prefixes. In particular, LeanXcale’s geometry predicates and operators exercise this property.

In this section, we will follow the following conventions:

geom is a GEOMETRY
geomCollection is a GEOMETRYCOLLECTION
point is a POINT
lineString is a LINESTRING
iMatrix is a DE-9IM intersection matrix
distance, tolerance, segmentLengthFraction, offsetDistance are of type double
dimension, quadSegs, srid, and zoom are of type integer
layerType is a character string
gml is a character string containing Geography Markup Language (GML)
wkt is a character string containing well-known text (WKT)
wkb is a binary string containing well-known binary (WKB)

In the “C” (for “compatibility”) column, “o” indicates that the function implements the OpenGIS Simple Features Implementation Specification for SQL, version 1.2.1, while “p” indicates that the function is a PostGIS extension to OpenGIS.

1. Geohash Indexes

LeanXcale can speed up queries over geospatial data by using Geohash indexing. Geohash is an encoding geographic location on Base 32 where the even bits represent the longitude precision and the odd bits represent the latitude precision. Geohash has some interesting properties so from an envelop you can narrow down the bins in which you have to look for geometries.

More information can be found in the following links:

The geohash index can be the primary key of a table or a secondary. In general the primary key performs better and is the preferred way to define a geom table.

To create a table with geohash you need the following syntax:

Create a table whose primary key is a Geohash key. This will create a hidden geohash field in your table that will be used as primary key

CREATE TABLE
	geohashedcountries(
		 countrycode VARCHAR,
		 geomLocation VARCHAR,
		 name VARCHAR,
	PRIMARY GEOHASH KEY(geomLocation));

Create the table, this time with a secondary index for the geohash. Again a hidden geohash table will be used:

CREATE TABLE
	geohashedcountries(
		 countrycode VARCHAR,
		 geomLocation VARCHAR,
		 name VARCHAR,
	PRIMARY KEY(name),
	GEOHASH(geomLocation));

Instead of a geometry field which can be an arbitrary geometry, you may use two fields that will be used as latitude and longitude:

CREATE TABLE geohashedcitypoints(
  citycode VARCHAR,
  latitude DOUBLE,
  longitude DOUBLE,
  name VARCHAR,
  PRIMARY GEOHASH KEY(latitude, longitude));

When running a geo query, LeanXcale’s query optimizer will automatically detect the Geohash index and use it to narrow down your search:

EXPLAIN PLAN FOR (
	SELECT name FROM geohashedcountries
	WHERE ST_Contains(
		ST_Buffer(ST_MakePoint(cast(-3.67 as double), cast(40.42 as double)), cast(0.5 as double)),
			ST_GeomFromText(geomLocation)
  ));

PLAN=EnumerableCalc(expr#0..2=[{inputs}], expr#3=[-3.67:DECIMAL(19, 0)], expr#4=[40.42:DECIMAL(19, 0)], expr#5=[ST_MAKEPOINT($t3, $t4)], expr#6=[0.5:DOUBLE], expr#7=[ST_BUFFER($t5, $t6)], expr#8=[ST_GEOMFROMTEXT($t0)], expr#9=[ST_CONTAINS($t7, $t8)], NAME=[$t1], $condition=[$t9])
            KiviPKTableScanRel(table=[[leanxcale, ADHOC, GEOHASHEDCOUNTRIES, filter:ST_GH_in($2, ST_GH_minMaxRegions(ST_BUFFER(ST_MAKEPOINT(-3.67:DOUBLE(19, 0), 40.42:DOUBLE(19, 0)), 0.5:DOUBLE))), project:[1, 2, 3]]], project=

Most frequently, all the functionality for geohashing is done internally so there is no need to explicitly define any condition for geohash. Anyway, the following functions are supported to query through the Geohash indexes:

ResultType

Function

Description

boolean

ST_GH_in(String geohash, String[][] minMaxRegions)

Check if a string is between any region of the list

String[][]

ST_GH_minMaxRegions(Geom geom)

Calculate the min max geohash regions(bins) the geom belongs to

String[]

ST_GH_encodeGeom(Geom geom)

Encodes the given geometry into a list of geoHash that contains it. The first item in the list would be the center

String

ST_GH_minGH(String geohash)

Generate the min value in geohash region

String

ST_GH_maxGH(String geohash)

Generate the max value in geohash region

String

ST_GH_encodeLatLon(double latitude, double longitude, int precision)

Encodes the given latitude and longitude into a geohash with the indicated precision (number of characters==number of 5bits groups)

String

ST_GH_encodeLatLon(double latitude, double longitude)

Encodes the given latitude and longitude into a geohash. Default precision is 12

String

ST_GH_encodeFromBinaryString(String binaryString)

Encodes the given binary string into a geohash.

String

ST_GH_encodeFromLong(long hashVal, int precision)

Encodes the given long into a geohash.

Double[]

ST_GH_decode(String geoHash)

Decodes the given geohash into a latitude and longitude

long

ST_GH_decodeToLong(String geoHash)

Decodes the given geoHash into bits as long value

String

ST_GH_decodeToBinaryString(String geoHash)

Decodes the given geoHash into a binary string

String

ST_GH_adjacent(String geoHash)

Returns the 8 adjacent hashes in the following order: N, NE, E, SE, S, SW, W, NW

String

ST_GH_regionsWithinDistance(double latitude, double longitude, double distance)

Returns the hashes that include the points within the specified distamce

String[]

ST_GH_regionsWithinDistance(Geom geom, Object distance)

Returns the hashes that include the points within the specified distance from the Geom.

String

ST_GH_northernNeighbour(String geoHash)

Returns the immediate neighbouring hash to the north

String

ST_GH_southernNeighbour(String geoHash)

Returns the immediate neighbouring hash to the south

String

ST_GH_westernNeighbour(String geoHash)

Returns the immediate neighbouring hash to the west

String

ST_GH_easternNeighbour(String geoHash)

Returns the immediate neighbouring hash to the east

Double[]

ST_GH_boundingBox(String geoHash)

Return the list of geohash limits for: min Latitude, min Longitude, max Latitude, max Longitude

2. Geometry Conversion Functions (2D)

C Operator syntax Description

ST_AsText(geom)

Alias for ST_AsWKT

ST_AsWKT(geom)

Converts geom → WKT

ST_GeomFromText(wkt [, srid ])

Returns a specified GEOMETRY value from WKT representation

ST_LineFromText(wkt [, srid ])

Converts WKT → LINESTRING

ST_MLineFromText(wkt [, srid ])

Converts WKT → MULTILINESTRING

ST_MPointFromText(wkt [, srid ])

Converts WKT → MULTIPOINT

ST_MPolyFromText(wkt [, srid ])

Converts WKT → MULTIPOLYGON

ST_PointFromText(wkt [, srid ])

Converts WKT → POINT

ST_PolyFromText(wkt [, srid ])

Converts WKT → POLYGON

3. Geometry Creation Functions (2D)

C Operator syntax Description

ST_MakeLine(point1 [, point ]*)

Creates a line-string from the given POINTs (or MULTIPOINTs)

ST_MakePoint(x, y [, z ])

Alias for ST_Point

ST_Point(x, y [, z ])

Constructs a point from two or three coordinates

4. Geometry Properties (2D)

Operator syntax

Description

ST_Boundary(geom [, srid ])

Returns the boundary of geom

ST_Distance(geom1, geom2)

Returns the distance between geom1 and geom2

ST_GeometryType(geom)

Returns the type of geom

ST_GeometryTypeCode(geom)

Returns the OGC SFS type code of geom

ST_Envelope(geom [, srid ])

Returns the envelope of geom (which may be a GEOMETRYCOLLECTION) as a GEOMETRY

ST_X(geom)

Returns the x-value of the first coordinate of geom

ST_Y(geom)

Returns the y-value of the first coordinate of geom

5. Geometry Properties (3D)

Operator syntax

Description

ST_Is3D(s)

Returns whether geom has at least one z-coordinate

ST_Z(geom)

Returns the z-value of the first coordinate of geom

6. Geometry Predicates

Operator syntax

Description

ST_Contains(geom1, geom2)

Returns whether geom1 contains geom2

ST_ContainsProperly(geom1, geom2)

Returns whether geom1 contains geom2 but does not intersect its boundary

ST_Crosses(geom1, geom2)

Returns whether geom1 crosses geom2

ST_Disjoint(geom1, geom2)

Returns whether geom1 and geom2 are disjoint

ST_DWithin(geom1, geom2, distance)

Returns whether geom1 and geom are within distance of one another

ST_EnvelopesIntersect(geom1, geom2)

Returns whether the envelope of geom1 intersects the envelope of geom2

ST_Equals(geom1, geom2)

Returns whether geom1 equals geom2

ST_Intersects(geom1, geom2)

Returns whether geom1 intersects geom2

ST_Overlaps(geom1, geom2)

Returns whether geom1 overlaps geom2

ST_Touches(geom1, geom2)

Returns whether geom1 touches geom2

ST_Within(geom1, geom2)

Returns whether geom1 is within geom2

7. Geometry Operators (2D)

The following functions combine 2D geometries.

Operator syntax

Description

ST_Buffer(geom, distance [, quadSegs | style ])

Computes a buffer around geom

ST_Union(geom1, geom2)

Computes the union of geom1 and geom2

ST_Union(geomCollection)

Computes the union of the geometries in geomCollection

See also: the ST_Union aggregate function.

8. Geometry Projection Functions

Operator syntax

Description

ST_SetSRID(geom, srid)

Returns a copy of geom with a new SRID

ST_Transform(geom, srid)

Transforms geom from one coordinate reference system (CRS) to the CRS specified by srid

Statistics

Statistics are required to have good performance on the queries. These statistics are used by the SQL planner to choose the optimal execution plan for an SQL query and the current data. The syntax is:

analyzeStatistics=
ANALYZE STATISTICS FOR
	[TABLE table_name]
	| [ALL TABLES];

This process scans the full table and generates internal statistics that are persisted. The length of this process depends on the size of the table since it iterates the whole table, row per row. We recommend it to do it periodically and after a massive data ingestion. A rule of thumb is when the accumulated changes since the last running of the statistics reach 10% of the data then the statistics should be run.

Hints

A hint is an instruction to the optimizer. When writing SQL, you may know information about the data that is unknown to the optimizer. Sometimes the optmizer might not generate the most optimal query plan. Hints enable to change aspects of the query plan generated by the optimizer.

There are two ways to define a hint:

Inline hints
Context hints

1. Inline Hints

Inline hints are defined as special comments. Depending on the hint type, it can be written in two locations:

Query Hint: right after the SELECT keyword.
Table Hint: right after the referenced table name.

Here you can see the kinds of hint:

select /*+ fixJoinOrder */ sum(a.i) from t1 /*+ forceAccess(idx1) */ a, t2 b where a.v = b.v;

The syntax is as follows:

hintComment:
	'/*+' hint [, hint ]* '*/'

hint:
	hintName
	| hintName '(' optionKey '=' optionVal [, optionKey '=' optionVal ]* ')'
	| hintName '(' hintOption [, hintOption ]* ')'

optionKey:
	simpleIdentifier
	| stringLiteral

optionVal:
	stringLiteral

hintOption:
	simpleIdentifier
 	| numericLiteral
	| stringLiteral

2. Query Hints

2.1. Force the Use of an Index

This hint forces the access to a table through a defined index. The hint name is forceAccess. It takes two parameters:

Table name.
Index name.

This is an example forcing to use index idx1 to access table t1:

select /*+ forceAccess(t1, idx1) */ sum(i) from t1 a
	where a.i between 1 and 5 and a.i2 between 11 and 20;

2.2. Disable Pushdown

Queries in LeanXcale can be executed either on the query engine, in the storage engines, or both, depending on the query. The part of the query executed in the storage engines is said to be pushed down to the storage engines. Sometimes one might want to prevent pushing down operations to storage engines for a given table in a query. In general, this will negatively impact performance, so only use it, if you actually know what you are doing. This means that all the work will be done by the query engine except for the scans.

This is an example that disables the pushdown for table t1:

select /*+ disablePushdown(t1) */ sum(i) from t1 a where a.i
	between 1 and 5 and a.i2 between 11 and 20;

2.3. Disable Join Pushdown

This hint disables the use of relops query engine. In general, this will negatively impact performance. This means that all join actions will be executed by the Java query engine instead of relops (the C query engine).

This is an example:

select /*+ skip_relops */ sum(a.i) from t1 a, t2 b where a.v = b.v

2.4. Skip Online Aggregate

This hint disables translating table’s query into a query over any of its online aggregates. In general, this will negatively impact performance.

select /*+ skip_oa(item) */ max(i_price), min(i_price), count(*) from item;

2.5. Fix Join Order

This hint sets the order of the joins as they are written in the query. This hint works at query level, so it will affect to all the joins present in the query.

This is an example:

select /*+ fixJoinOrder */ sum(a.i) from t1 a, t2 b where a.v = b.v

2.6. Force Join Type

This hint allows to force the kind of join which will be used between two given tables. This is an example:

select /*+ forceJoinType(NESTED_LOOP, t1, t2) */ sum(a.i) from t1 a, t2 b where a.v = b.v;

Where JOIN_TYPE is one of the available types:

CORRELATE
MERGE_JOIN
NESTED_LOOP

2.7. Or To Union

This hint transforms OR conditions into UNION clauses. This is an example:

select /*+ orToUnion(t1) */ sum(i) from t1
	where i between 1 and 3 and v is not null or i between 18 and 20;

2.8. Or To Union All

This hints transforms OR conditions into UNION ALL clauses. This is an example:

select /*+ orToUnionAll(t1) */ sum(i) from t1
	where i between 1 and 3 and v is not null or i between 18 and 20;

3. Table Hints

3.1. Force the Use of an Index

This hint forces the access to a table through a defined index. To use it invoke the function as follows: An example is:

select sum(i) from t1 /*+ forceAccess(t1) */ a
	where a.i between 1 and 5 and a.i2 between 11 and 20;

3.2. Disable Pushdown

This hint disables the pushdown of operations to the storage engine for a given table. In general, this will negatively impact performance. This means that all the work will be done by the query engine except the scans. This is an example:

select sum(i) from t1  /*+ disablePushdown */ a
	where a.i between 1 and 5 and a.i2 between 11 and 20;

3.3. Skip Online Aggregate

This hint disables translating table’s query into a query over any of its online aggregates. This is an example:

select max(i_price), min(i_price), count(*) from item /*+ skip_oa */;

3.4. Or to Union

This hints transforms OR conditions into UNION clauses. This is an example:

select sum(i) from t1 /*+ orToUnion */
	where i between 1 and 3 and v is not null or i between 18 and 20;

3.5. Or to Union All

This is an example: This hints transforms OR conditions into UNION ALL clauses.

select sum(i) from t1 /*+ orToUnionAll */
	where i between 1 and 3 and v is not null or i between 18 and 20;

3.6. Batch Size

This hint applies only to DMLs and is equivalent to BATCH statements. The following statements are equivalent:

upsert into t2 /*+ batchSize(5000) */ (pk, v, fk)
	select i2, v, i from t1 where v is not null;
batch 5000 upsert into t2 (pk, v, fk)
	select i2, v, i from t1 where v is not null;

4. Context Hints

Context Hints are established within a session for a given connection and affect all queries issued in that session after being set. The actions that can be performed for context hints are:

Enable context hints for the connection
Disable context hints for the connection
List a context hint
Refresh plan cache
Set a context hint

In the following subsections we explain the different steps.

4.1. Enable Context Hints

In order to allow to use context hints, they have to be enabled. This can be done using enableHints table function as follows:

EXEC enableHints()

The output is:

Hints enabled for connection ...

4.2. Disable Context Hints

Disable the hints deletes all hints set in the session. However, the execution plans that have been created using queries with the context hints activated will remain in the plan cache.

This can be done using disableHints function as follows:

EXEC disableHints()

The output is:

Hints disabled for connection ...

4.3. List Context Hints

Lists hints is useful to know the defined context hints for the present connection, and to retrieve their ID (which can be used to remove them).

This can be done using "listHints" function as follows:

EXEC listHints()

The output will be a table with a row for each defined hint and three columns: ID that identifies the hint within the connection, the hint type, and the description of the hint.

4.4. Set Context Hints

In order to set a context hint you have to invoke the appropriate function for the hint you want to define. The following subsections explain each of the available hints.

4.4.1. Force the Use of an Index

This hint forces the access to a table through a defined index. It is used as follows:

EXEC forceAccess('TABLE_NAME','INDEX_NAME')

Where TABLE_NAME is the qualified name of the table (i.e., db-APP-CUSTOMER) and INDEX_NAME is the qualified name of the index (i.e., db-APP-CUSTOMERIDX). This function does not check the existence of the index nor the table so, if you force the access through an inexistent index, the queries that use this table will fail.

4.4.2. Disable Pushdown

This hint disables the pushdown of operations to the storage engine. In general, this will negatively impact performance. This means that all the work will be done by query engine except the scans that are always performed on the storage engine side.

It is used as follows:

EXEC disablePushdown('TABLE_NAME')

Where TABLE_NAME is the qualified name of the table (i.e. db-APP-CUSTOMER).

4.4.3. Fixed Join Order

This hint sets the order of the joins as they are written in the query. It is defined as follows:

EXEC fixJoinOrder()

4.4.4. Force Join Type

This hint allows to force the kind of join which will be used between two given tables. It is set by:

EXEC forceJoinType('JOIN_TYPE', 'TABLE_1', 'TABLE_2')

Where JOIN_TYPE is one of the available types:

CORRELATE,
MERGE_JOIN,
NESTED_LOOP

TABLE_1 and TABLE_2 are the unqualified names of the affected tables (i.e. CUSTOMER or WAREHOUSE). This function does not check that the chosen join type is available for a given pair of tables so if not the query will fail.

4.4.5. Or To Union

This hint transforms OR conditions into UNION clauses. It accepts a Boolean parameter which indicates whether the UNION will be a UNION ALL or not.

It is activated by:

EXEC orToUnion(all)

Where all is true for a UNION ALL or false otherwise.

4.5. Remove Context Hints

A hint can be removed for the present connection. But, if the hint has been used to execute a query and an execution plan has been created and stored in the query plan cache, remove the hint won’t remove the query plan and the queries will continue to use it for their execution.

Removing a hint is performed as follows:

EXEC removeHint(HINT_ID)

Where HINT_ID (an integer, without quotes) is the identifier of the hint that wants to be removed, and can be obtained from the listHints table function.

5. Clean Query Plan Cache

This table function removes all the query plans that have been cached for a given table (not only for this connection but for every query in the server). It can be useful to ensure that a new query plan will be calculated when a hint is added or removed. However, it has an associated overhead, it will force to generate the plan for all subsequent queries that is an expensive operation.

It is invoked as follows:

EXEC cleanPlanCache('TABLE_NAME')

Where TABLE_NAME is the qualified name of the table (i.e. db-APP-CUSTOMER) from which the query plans will be removed.

6. Enable/Disable Planner Logger

These two functions let the user to enable or disable the optimizer logger. To be able to use it, you have to define a logger in your log4j configuration for the class org.apache.calcite.plan.RelOptPlanner with the level you want.

When enabling it, the log level for the RelOptPlanner logger will be set to ALL, and the planner will log all the query optimization steps in the appender configured for the RelOptPlanner logger.

Use it with caution because the amount of generated data can affect negatively the performance of the query preparation.

They can be invoked as follows:

EXEC enablePlannerDebug()
EXEC disablePlannerDebug()

System Tables

LeanXcale’s query engine provides a series of virtual tables giving access to the metadata of stored databases, also known as data dictionaries in other databases.

All the system tables are part of a virtual schema called LXSYS. Each of the following subsections is devoted to each of the tables in the system.

1. LXSYS.TABLES

It shows all the tables. Initially, only metadata tables are shown. The columns of the table are as follows:

tableCat: Database on which the table resides.
tableSchem: Schema on which the table resides.
tableName: Name of the table.
owner: User owner of the table.
tableType: It is the kind of table. It can have the following values:
- SYSTEM TABLE.
- TABLE.
- SEQUENCE.
- VIEW.
comment: a comment made on the table.

Note that indexes do not appear in this list because they are not tables by themselves.

lx% select * from lxsys.tables;

tableCat    tableSchem    tableName                   owner      tableType     comment

  db          LXSYS         AGGREGATIONS                lxadmin    SYSTEM TABLE
  db          LXSYS         ALL_CONS_COLUMNS            lxadmin    SYSTEM TABLE
  db          LXSYS         AUTO_SPLITS                 lxadmin    SYSTEM TABLE
  db          LXSYS         COLUMNS                     lxadmin    SYSTEM TABLE
  db          LXSYS         COMPONENTS                  lxadmin    SYSTEM TABLE
  db          LXSYS         CONNECTIONS                 lxadmin    SYSTEM TABLE
  db          LXSYS         DATASTORE_STATS             lxadmin    SYSTEM TABLE
  db          LXSYS         FOREIGN_KEYS                lxadmin    SYSTEM TABLE
  db          LXSYS         GROUP_PERMISSION            lxadmin    SYSTEM TABLE
  db          LXSYS         INDEX_COLUMNS               lxadmin    SYSTEM TABLE
  db          LXSYS         INDEXES                     lxadmin    SYSTEM TABLE
  db          LXSYS         ONLINE_AGGREGATE_FILTERS    lxadmin    SYSTEM TABLE
  db          LXSYS         ONLINE_AGGREGATES           lxadmin    SYSTEM TABLE
  db          LXSYS         ONLINE_AGGREGATIONS         lxadmin    SYSTEM TABLE
  db          LXSYS         PLANS_CACHE                 lxadmin    SYSTEM TABLE
  db          LXSYS         PRIMARY_KEYS                lxadmin    SYSTEM TABLE
  db          LXSYS         QUERY_CACHE                 lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_CHECKS                lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_FUNCTIONS             lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_PARTITIONS            lxadmin    SYSTEM TABLE
  db          LXSYS         TABLE_STATS                 lxadmin    SYSTEM TABLE
  db          LXSYS         TABLES                      lxadmin    SYSTEM TABLE
  db          LXSYS         TRANSACTIONS                lxadmin    SYSTEM TABLE
  db          LXSYS         TRIGGER_FUNCTIONS           lxadmin    SYSTEM TABLE
  db          LXSYS         TRIGGERS                    lxadmin    SYSTEM TABLE
  db          LXSYS         TYPE_INFO                   lxadmin    SYSTEM TABLE
  db          LXSYS         USER_GROUP                  lxadmin    SYSTEM TABLE
  db          LXSYS         VIEW_TABLES                 lxadmin    SYSTEM TABLE

2. LXSYS.COLUMNS

This system table shows all the columns of all tables in the system. It has the following columns:

tableCat: Database on which the table resides.
tableSchem: Schema on which the table resides.
tableName: Name of the table.
columnName: Name of the column.
dataType: Type of the column as an INTEGER. The possible values are:
- 4: INTEGER
- 12 VARCHAR
- …
typeName: It is the SQL type name. If NOT NULL constraint is set on the column is shown after the type name. E.g.: INTEGER NOT NULL.
columnSize: For VARCHAR columns it provides the max length. For DECIMAL columns it provides the length of the non decimal part. For TIMESTAMP columns it is 6. For other column types is -1.
bufferLength. Deprecated. Do not use it.
decimalDigits. For DECIMAL columns it provides the number of fractional digits.
numPrecRadix. Indicates the base of the numeric columns. It is always 10.
nullable. Deprecated. 1 if it can be NULL, 0 if it has NON NULL constraint.
remarks. Comment on the column.
columnDef. Deprecated. Do not use it.
sqlDataType. Deprecated. Do not use it.
sqlDatetimeSub. Deprecated. Do not use it.
charOctetLength. For VARCHAR columns it shows the max length in bytes.
ordinalPosition. The position of the column in the create statement. The first column has position 1.
isNullable. YES if it can be NULL, NO if it has NON NULL constraint.
scopeCatalog. Deprecated. Do not use it.
scopeSchema. Deprecated. Do not use it.
scopeTable. Deprecated. Do not use it.
sourceDataType Deprecated. Do not use it.
isAutoincrement. For non-system tables it is YES if the column is auto-incremental, NO, otherwise. For system tables is blank.
isGeneratedcolumn. For non-system tables it is YES if the column is auto-generated, NO, otherwise. For system tables is blank.

Here there is a fragment of the table contents after creating a user table T with two columns PK and C:

lx% select * from lxsys.columns;

  tableCat    tableSchem    tableName                   columnName             dataType    typeName                               columnSize    bufferLength    decimalDigits    numPrecRadix    nullable    remarks    columnDef    sqlDataType    sqlDatetimeSub    charOctetLength    ordinalPosition    isNullable    scopeCatalog    scopeSchema    scopeTable    sourceDataType    isAutoincrement    isGeneratedcolumn

  db          APP           T                           PK                     4           INTEGER                                -1                                             10              1                                                                    -1                 1                  YES                                                                          NO                 NO
  db          APP           T                           C                      12          VARCHAR(4096) CHARACTER SET "UTF-8"    4096                                           10              1                                                                    4096               2                  YES                                                                          NO                 NO
  db          LXSYS         AGGREGATIONS                name                   12          VARCHAR                                -1                                             10              1                                                                    -1                 1                  YES
...

3. LXSYS.PRIMARY_KEYS

It shows the primary keys for user created tables.

The column names of this system table are:

columnName: Name of the column.
keySeq. 1 if it is a sequence, 0 otherwise.
pkName. Same as columnName.
tableCat: Database on which the table resides.
tableSchem: Schema on which the table resides.
tableName: Name of the table.

This is the contents after creating a table t with a column pk as primary key:

lx% select * from lxsys.primary_keys;

  columnName    keySeq    pkName    tableCat    tableSchem    tableName

  PK            0         PK        db          APP           T

4. LXSYS.FOREIGN_KEYS

It contains all the declared foreign keys in the system for user created tables. The column names of this system table are:

pktableCat: Database of the table with the referenced primary key.
pktableSchem: Schema of the table with the referenced primary key.
pktableName: Table with the referenced primary key.
pkcolumnName: Column name of the referenced primary key.
fktableCat: Database of the table with the foreign key.
fktableSchem: Schema of the table with the foreign key.
fktableName: Table with the foreign key.
fkcolumnName: Column with the foreign key.
keySeq: 1 if is a sequence, 0 otherwise.
updateRule: Rule for updates on the primary key.
deleteRule: Rule for deletes on the primary key.
fkName: Name of the foreign key.
pkName: Name of the primary key.
deferrability: 1 is the foreign key check is delayed to the commit of the transaction, 0 if it is checked immediately after changing the primary key. Currently, not supported. An example of this table after creating a table with a foreign key to another table:

lx% select * from lxsys.foreign_keys;

  pktableCat    pktableSchem    pktableName    pkcolumnName    fktableCat    fktableSchem    fktableName    fkcolumnName    keySeq    updateRule    deleteRule    fkName                   pkName    deferrability

  db            APP             T1             PK              db            APP             T2             FK_TO_T1        0         3             3             FK_db-APP-T2_FK_TO_T1    [PK]      7

5. LXSYS.INDEXES

It shows all the indexes created for user created tables in the system.

The columns are:

tableCat: Database on which the table resides.
tableSchem: Schema on which the table resides.
tableName: Name of the table.
non_unique: Non unique constraint.
index_qualifier: ??
index_name: Name of the index.
type: ??
ordinal_position: ??
column_name: Name of the indexed column.
asc_or_desc: 1 if ascending ordered, and 0 if descending ordered.
cardinality: Cardinality of the index.
pages: ??
filter_condition: ??

An example of the table contents with an index created for a table is:

> select * from LXSYS.indexes;

6. LXSYS.TABLE_CHECKS

It shows all the table integrity constraints defined in user created tables (NOT NULL, …). The columns are:

name: Constraint name.
expression: Constraint description.
tableCat: Database on which the table resides.
tableSchem: Schema on which the table resides.
tableName: Name of the table.

7. LXSYS.TRANSACTIONS

It shows all active transactions in the Query Engine relating the information to the connection. The following fields are showed:

txn. The transaction id.
state: State of the transaction. // ?? possible values??
conn: Connection id.
session: Session id.
mode: Isolation level of the session.
uid: user that created the session.
startTs: Start timestamp of the transaction.
time: Epoch with the timestamp at which the transaction was started.
commitTs: If the transaction already committed its commit timestamp, -1 otherwise.

lx% select * from LXSYS.TRANSACTIONS;

  txn           state     conn                                    session    mode                  uid        startTs       time               commitTs

  1031227002    ACTIVE    681efbc4-6f56-41ac-b11f-3dc32bbac7f9    6          SNAPSHOT_ISOLATION    lxadmin    1031226001    435391393809375    -1

8. LXSYS.CONNECTIONS

It contains all the currently open connections with the database.

The columns are:

connectionId: The identifier of the connection.
isCurrent: true if it is the one that you are using, false otherwise.
uid: The user who created the connection.
isolationLevel: Isolation level of the connection.
created: When the connection was created.
used: Last time the connection was used.
hostIP: the IP from which the client is connected.
status: A value indicating the status that can be:
- ACTIVE when the connection is running an action.
- IDLE when the connection is not doing anything.
- FETCHING when the server is sending the resultset to the client side and the client side is iterating the resultset.
lastAction: Last action executed.
lastSql: Last SQL statement executed.
database: Database to which the connection was established.
schema: Schema to which the connection was established.

This is an example of the table contents with just the connection being used by the SQL console:

lx% select * from LXSYS.CONNECTIONS;

connectionId                            isCurrent    uid        isolationLevel        created                    used                       hostIP    status    lastAction        lastSql                            database    schema

f262cbab-f6db-4362-aac4-7727a32220ee    true         lxadmin    SNAPSHOT_ISOLATION    2024-02-08 13:45:04.662    2024-02-08 19:18:24.019              ACTIVE    EXECUTE_DIRECT    select * from LXSYS.CONNECTIONS    db          APP

9. LXSYS.ONLINE_AGGREGATIONS

It contains the user defined online aggregations.

Its columns are:

onlineRelationTable: Name of the online aggregation table.
ordinalPosition: ??
name: Name of the online aggregate column.
aggType: Kind of aggregation that can be:
- COUNT.
- SUM.
- MAX.
- MIN.
aggExpression: Expression over which the online aggregate is applied.
tableCat: Database where is defined.
tableSchem: Schema where is defined.
tableName: Name of the parent table over which the online aggregation has been defined.

An example follows:

lx% select * from LXSYS.ONLINE_AGGREGATIONS where onlineRelationTable = 'AGG_TRIPS' order by ordinalPosition;

 onlineRelationTable  ordinalPosition  name        aggType  aggExpression          tableCat  tableSchem  tableName
 AGG_TRIPS            1                DURATION    SUM      CAST2INT[$DURATION]    demo      APP         TRIPS
 AGG_TRIPS            2                MAX_START   MAX      $START_TS              demo      APP         TRIPS
 AGG_TRIPS            3                MIN_START   MIN      $START_TS              demo      APP         TRIPS
 AGG_TRIPS            4                PASSENGERS  SUM      CAST2INT[$PASSENGERS]  demo      APP         TRIPS
 AGG_TRIPS            5                COUNT_ALL   COUNT                           demo      APP         TRIPS

10. LXSYS.ONLINE_AGGREGATES

It shows information about the group by expressions for existing online aggregation relations

lx% select * from LXSYS.ONLINE_AGGREGATES where onlineRelationTable = 'AGG_TRIPS' order by ordinalPosition;

 onlineRelationTable  ordinalPosition  name       groupBy                                                                    tableCat  tableSchem  tableName
 AGG_TRIPS            1                END_YEAR   TS_EXTRACT['YEAR', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]   demo      APP         TRIPS
 AGG_TRIPS            2                END_MONTH  TS_EXTRACT['MONTH', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]  demo      APP         TRIPS
 AGG_TRIPS            3                END_DAY    TS_EXTRACT['DAY', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]    demo      APP         TRIPS
 AGG_TRIPS            4                END_HOUR   TS_EXTRACT['HOUR', AUTOCAST[($START_TS + AUTOCAST[(1000 * $DURATION)])]]   demo      APP         TRIPS

11. LXSYS.ONLINE_AGGREGATE_FILTERS

It shows information about the filter expressions for existing online aggregation relations

lx% select * from LXSYS.ONLINE_AGGREGATE_FILTERS where onlineRelationTable = 'AGG_TRIPS';

 onlineRelationTable  filter                                  tableCat  tableSchem  tableName
 AGG_TRIPS            (TS_EXTRACT['YEAR', $START_TS] = 2021)  demo      APP         TRIPS

12. LXSYS.AGGREGATIONS

It shows information about the aggregation columns for user defined tables.

lx% select * from LXSYS.AGGREGATIONS;

name         aggType  tableCat  tableSchem  tableName
 PASSENGERS  SUM      demo      APP         MANUAL_AGG_TRIPS
 DURATION    SUM      demo      APP         MANUAL_AGG_TRIPS
 MIN_START   MIN      demo      APP         MANUAL_AGG_TRIPS
 COUNT_ALL   SUM      demo      APP         MANUAL_AGG_TRIPS
 MAX_START   MAX      demo      APP         MANUAL_AGG_TRIPS

13. LXSYS.TABLE_PARTITIONS

It shows information about the table partitions and the split points for each partition.

The columns are:

catalog: Database name.
schema: Schema name.
tableName: Table name.
regionId: Id of the partition/region/fragment.
address: Id of the storage server holding the partition.
minValue: Min value contained by the partition. MIN means the lowest possible value.
maxValue: Max value contained by the partition. MAX means the highest possible value.

This is an example with a table with a single partition containing all data.

lx% select * from LXSYS.TABLE_PARTITIONS order by tableName;

  catalog    schema    tableName       regionId      address    minValue    maxValue

  db         APP       T            1649267441666    ds100      MIN         MAX

14. LXSYS.TABLE_STATS

It contains the collected statistics and their values.

Its columns are:

catalog: Database name.
schema: Schema name.
tableName: Table name.
regionId: Id of the partition/region/fragment.
address: Id of the storage server holding the partition.
stat: Name of the stat.
value: Value of the stat.

This is an example of the stats of an empty table:

lx% select * from LXSYS.TABLE_STATS;

  catalog    schema    tableName    regionId         address    stat        value

  db         APP       T            1649267441666    ds100      nndwr       1
  db         APP       T            1649267441666    ds100      nmbs        3
  db         APP       T            1649267441666    ds100      rtime       1707415699
  db         APP       T            1649267441666    ds100      depth       1
  db         APP       T            1649267441666    ds100      nblkwr      4
  db         APP       T            1649267441666    ds100      noldused    1
  db         APP       T            1649267441666    ds100      nsplit      1
  db         APP       T            1649267441666    ds100      noldblks    1
  db         APP       T            1649267441666    ds100      mtime       1707415699
  db         APP       T            1649267441666    ds100      stime       1707415699
  db         APP       T            1649267441666    ds100      wtime       1707415699
  db         APP       T            1649267441666    ds100      nnd         1
  db         APP       T            1649267441666    ds100      nused       1
  db         APP       T            1649267441666    ds100      ctime       1707415699
  db         APP       T            1649267441666    ds100      ncache      2
  db         APP       T            1649267441666    ds100      nblks       3

15. LXSYS.DATASTORE_STATS

It shows information about table statistics kept by the storage servers.

Its columns are:

address: Id of the storage server holding the partition.
stat: Name of the stat.
value: Value of the stat.

This shows the number of blocks held in each storage server:

lx% select * from LXSYS.DATASTORE_STATS where stat like '%nblks%';

  address    stat          value

  ds103      mblk/nblks    8192
  ds102      mblk/nblks    8192
  ds101      mblk/nblks    8192
  ds100      mblk/nblks    8192

16. LXSYS.USER_GROUP

Deprecated. Not to be used.

17. LXSYS.GROUP_PERMISSION

It contains the permissions that have been granted. Its columns are:

groupId
path
catalog
schema
entityName
perms

lx% select * from LXSYS.GROUP_PERMISSION where groupId = 'APP';
 groupId  path  catalog  schema  entityName  isRead  isWrite  isAlter
 APP      /seq  demo     APP                 true    true     true
 APP      /tbl  demo     APP                 true    true     true

18. LXSYS.PLANS_CACHE

It shows execution plans that are currently cached.

Its columns are:

initialPlan: Initial query plan after compilation.
executionPlan: Final query plan produced by the optimizer.
tableCat: Database name.
tableSchem: Schema name.
tableName: Table name.

An example with a fragment of the contents is:

lx% select * from LXSYS.PLANS_CACHE;

initialPlan                                            executionPlan                                                        tableCat    tableSchem    tableName

LogicalProject(inputs=[0..4])
  LogicalTableScan(table=[[db, LXSYS, PLANS_CACHE]])   EnumerableTableScan.ENUMERABLE.[](table=[db, LXSYS, PLANS_CACHE])    db          LXSYS         PLANS_CACHE
...

19. LXSYS.TRIGGER_FUNCTIONS

It contains all predefined and user defined trigger functions.

Its columns are:

triggerName: Name of the trigger function.
description: Short description of the purposed of the trigger.

An example is:

%lx select * from LXSYS.TRIGGER_FUNCTIONS;

  triggerName                  description

  HashFieldTrigger             Hash Field Trigger
  UUIDTrigger                  UUID Trigger
  AutoGeohashTrigger           Auto Geohash Trigger
  AutoIncrementTrigger         Auto Increment Trigger
  DefaultValueTrigger          Default Value Trigger
  Trigram_Trigger              Trigger to maintain trigram index
  ForceAutoIncrementTrigger    Force Auto Increment Trigger
  DeleteForeignKeyTrigger      Check for Referential Integrity when deleting row from referenced table
  ForeignKeyTrigger            Check for Referential Integrity

20. LXSYS.TRIGGERS

It contains all predefined and user defined trigger functions.

Its columns are:

triggerName: Name of the trigger.
triggerFunction: Name of the trigger function.
args: parameters of the trigger.
isBefore: true if executed before the triggering action, false otherwise.
isInsert: true if the triggering action is an insert, false otherwise.
isUpdate: true if the triggering action is an update, false otherwise.
isDelete: true if the triggering action is a delete, false otherwise.
priority: Specifies the precedence of a trigger compared to other triggers that are defined on the same event. Triggers with lower priority values are executed before triggers with higher priority values.
isEnable: true if it is currently enabled, false if disabled.
tableCat: Database name.
tableSchem: Schema name.
tableName: Table name.

%lx select * from LXSYS.TRIGGERS;
triggerName              triggerFunction       args                      isBefore  isInsert  isUpdate  isDelete  priority  tableCat            tableSchem  tableName

 I_autoIncrementTrigger  autoIncrementTrigger  I,NOUSR.TAUTO_IDENTITY_I  true      true      false     false     100       contrib_regression  NOUSR       TAUTO
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       TESTTAB_FK
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       BYTEATAB
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       BOOLTAB
 _UUID__UUIDTrigger      UUIDTrigger           _UUID_                    true      true      false     false     100       contrib_regression  NOUSR       BYTEATAB2

21. LXSYS.TABLE_FUNCTIONS

It shows available table functions.

Its columns are:

functionName: Name of the table function.
parameters: List of pararmeters.
description: Description of how to use it.

An example with a short fragment of the contents is:

lx% Select * from LXSYS.TABLE_FUNCTIONS;

functionName            parameters           description
oaValidator             (String, String)     oaValidator receives Online Aggregator Name ...
...

22. LXSYS.VIEW_TABLES

It shows user defined views.

Its columns are:

query: The query associated to the view.
holder: //?? what is this
tableCat: Database name.
tableSchem: Schema name.
tableName: Table name.

And example is:

lx% Select * from LXSYS.VIEW_TABLES;

query                             holder        tableCat  tableSchem  tableName
SELECT * FROM "T1" WHERE "I" = 8  adhoc-APP-T1  adhoc     APP         MYVIEW

Table Functions

1. How to Develop and Deploy a Table Function

First you need to add the LeanXcale maven repository and the qe-tableFunctions dependency to your pom.xml file:

<repositories>
    <repository>
        <id>maven-releases</id>
        <url>https://nexus.leanxcale.com/repository/maven-releases</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>com.leanxcale</groupId>
        <artifactId>qe-tableFunctions</artifactId>
        <version>1.7.6</version>
    </dependency>
</dependencies>

In order to create a custom table function, a Java class has to be written implementing the AutoloadTableFunction interface and defining the following methods:

Function

Description

Return

getFunctionName()

Define the custom table function name

String

getDescription()

Provides the custom table function description

String

eval(<table function arguments>)

Generate the collection of rows to be queried

ScannableTable

getRowType
- RelDataType getRowType(RelDataTypeFactory typeFactory)
- Description:
  - Returns this table’s row type. This is a struct type whose fields describe the names and types of the columns in this table. The implementer must use the type factory provided. This ensures that the type is converted into a canonical form; other equal types in the same query will use the same object.
- Parameters:
  - typeFactory: Type factory with which to create the type.
- Returns:
  - Row type
getStatistic
- Statistic getStatistic()
- Description:
  - Statistic getStatistic()
- Returns:
  - Statistic
getJdbcTableType
- Description:
  - Returns the type of table.
- Returns:
  - Table type.
sRolledUp
- Description:
  - Determines whether the given column has been rolled up.
- Returns:
  - true if the column is rolled up.
rolledUpColumnValidInsideAgg
- Description:
  - Determines whether the given rolled up column can be used inside the given aggregate function. You can assume that isRolledUp(column) is true.
- Parameters:
  - column - The column name for which isRolledUp is true.
  - call - The aggregate call.
  - parent - Parent node of call in the SqlNode tree.
  - config - Config settings. May be null.
- Returns:
  - true if the given aggregate call is valid.

All the table function classes must belong to the package com.leanxcale in order to have them auto-loaded when LeanXcale starts.

Find below a complete example:

package com.leanxcale;

import com.leanxcale.QE.tableFunction.AutoloadTableFunction;
import org.apache.calcite.DataContext;
import org.apache.calcite.config.CalciteConnectionConfig;
import org.apache.calcite.linq4j.Enumerable;
import org.apache.calcite.linq4j.Linq4j;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.schema.ScannableTable;
import org.apache.calcite.schema.Schema;
import org.apache.calcite.schema.Statistic;
import org.apache.calcite.schema.Statistics;
import org.apache.calcite.sql.SqlCall;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.type.SqlTypeName;
import org.apache.calcite.util.ImmutableBitSet;
import java.util.Collections;

public class SeriesGeneratorTableFunction implements AutoloadTableFunction {

  public static String getFunctionName() {
    return "series_generator";
  }

  public ScannableTable eval(Integer start, Integer stop, Integer step) {
    int actualStep = (null == step) ? 1 : step;
    int numResults = (1 + (stop - start)/actualStep);
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        Object[][] results = new Object[numResults][1];
        for (int inx = 0; inx < numResults; inx++) {
          results[inx][0] = start + inx*actualStep;
        }
        return Linq4j.asEnumerable(results);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        return typeFactory.builder()
                .add("SERIES", SqlTypeName.INTEGER)
                .build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(numResults, Collections.singletonList(ImmutableBitSet.of(numResults)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

2. Using Table Functions

You can use a table function in the FROM clause of a query as the first argument to the table function. In this example, series_generator is the table function:

SELECT SERIES FROM table(series_generator(1, 10, 1)) WHERE SERIES > 3;

+--------+
| SERIES |
+--------+
| 4      |
| 5      |
| 6      |
| 7      |
| 8      |
| 9      |
| 10     |
+--------+
7 rows selected (0.008 seconds)

Take into account that if the parameter you’re passing to your table function is a string, you’ll need to use single quotes. Double quotes are not allowed. This is an example:

SELECT * FROM table(my_new_tablefunction('test_string'));

But this does not work:

SELECT * FROM table(my_new_tablefunction("test_string"));

3. Table Function Utils

In order to ease the writing of table function, you can use the TableFunctionUtils class. You will be able to execute SQL queries within your Table Function.

ResultType

Function

Description

TableFunctionUtils

utils()

Static function to build an instance for the table function utils according to the current connection

Connection

getConnection()

Provide an connection to perfom SQL actions. This connection cannot be committed or rollbacked

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID ////

Session

session(Settings Settings)

Creates a new session to the KiVi database. It shares the current KiVi connection.

Session

session()

Creates a new session to the KiVi database. It shares the current KiVi connection and transaction to connect through the direct API. ////

Table

getTable(String tableName)

Provide a table object that exist within current transaction context so you can perform basic operations on it

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

Find below an example:

public class TableFunctionUtilsExample implements AutoloadTableFunction {
  public static final String functionName = "TableFunctionUtilsExample";

  public static String getFunctionName() {
    return functionName;
  }

  public static String getDescription() {
    return functionName;
  }

  public ScannableTable eval(Object parentTableName, Object sonTableName) {
    return new ScannableTable() {
      @Override
      public Enumerable<Object[]> scan(DataContext dataContext) {
        List<Object[]> result = new ArrayList<>();

        TableFunctionUtils utils = TableFunctionUtils.utils();
        Settings settings = new Settings();
        settings.setDatabase(utils.getDbName());
        settings.setSchema(utils.getSchemaName());
        settings.disableConflictChecking();
        settings.disableLogging();
        try (Session session = utils.session(settings)) {
          Table table = session.database().getTable((String) parentTableName);
          try (ResultSet rs = utils.getConnection().createStatement().executeQuery("select i_name, max_price from " + sonTableName)) {
            while (rs.next()) {
              Tuple tuple = table.find()
                  .filter(Filters.eq("i_name", rs.getString(1)).and(Filters.eq("i_price", rs.getInt(2))))
                  .iterator().next();
              Object[] row = new Object[3];
              row[0] = tuple.get("i_id");
              row[1] = tuple.get("i_name");
              row[2] = tuple.get("i_price");
              result.add(row);
            }
          }
        } catch (Exception exception) {
          throw new LeanxcaleRuntimeException(exception);
        }

        return Linq4j.asEnumerable(result);
      }

      @Override
      public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        RelDataTypeFactory.Builder builder = typeFactory.builder();
        builder.add("ID", SqlTypeName.INTEGER);
        builder.add("NAME", SqlTypeName.VARCHAR);
        builder.add("PRICE", SqlTypeName.INTEGER);
        return builder.build();
      }

      @Override
      public Statistic getStatistic() {
        return Statistics.of(1, Collections.singletonList(ImmutableBitSet.of(1)));
      }

      @Override
      public Schema.TableType getJdbcTableType() {
        return Schema.TableType.TABLE;
      }

      @Override
      public boolean isRolledUp(String s) {
        return false;
      }

      @Override
      public boolean rolledUpColumnValidInsideAgg(String s, SqlCall sqlCall, SqlNode sqlNode, CalciteConnectionConfig calciteConnectionConfig) {
        return false;
      }
    };
  }
}

DML Triggers

CREATE TRIGGER myTrigger BEFORE INSERT UPDATE ON t
FOR EACH ROW EXECUTE genericBackupTrigger('t_backup,ts')
PRIORITY 10;

DISABLE TRIGGER myTrigger ON t;

ENABLE TRIGGER myTrigger ON t;

DROP TRIGGER myTrigger ON TABLE t;

1. CREATE TRIGGER Statement

The syntax to create a trigger is as follow:

createTrigger:
      CREATE TRIGGER [ IF NOT EXISTS ] triggerName
      (BEFORE | AFTER)
      [INSERT] [UPDATE] [DELETE]
      ON tableName FOR EACH ROW EXECUTE
      triggerFunction ['(' arg ')']
      [PRIORITY priority]

where

triggerName is the name used by the user to identify the trigger.
tableName is the table where the trigger applies
triggerFunction is the Trigger Function name as the result of getTriggerName(). See Trigger Function interface
arg is the trigger function argument as string literal passed in triggerExecutor(). Check [triggerExecutor params]
priority is an integer literal used to order teh trigger execution in case that the table have more than one. Note that the resulting new values from one trigger will be passed to the execution of the next one.

Note that you can define 1 to 3 dml events among INSERT, UPDATE and DELETE.

2. Trigger Functions System Table

To find out the loaded trigger functions, you can query TRIGGER_FUNCTIONS system table.

> select * from LXSYSMETA.TRIGGER_FUNCTIONS;
+---------------------------+-------------------------------------------------------------------------+
|        triggerName        |                               description                               |
+---------------------------+-------------------------------------------------------------------------+
| HashFieldTrigger          | Hash Field Trigger                                                      |
| UUIDTrigger               | UUID Trigger                                                            |
| AutoGeohashTrigger        | Auto Geohash Trigger                                                    |
| ForceAutoIncrementTrigger | Force Auto Increment Trigger                                            |
| AutoIncrementTrigger      | Auto Increment Trigger                                                  |
| DefaultValueTrigger       | Default Value Trigger                                                   |
| DeleteForeignKeyTrigger   | Check for Referential Integrity when deleting row from referenced table |
| ForeignKeyTrigger         | Check for Referential Integrity                                         |
+---------------------------+-------------------------------------------------------------------------+

3. Custom Trigger Functions

ResultType

Function

Description

String

getTriggerName()

Define the trigger function name

String

description()

Define a description for the trigger function so users can figure out what it does

Object

buildArguments(String arg)

Build an instance of argumntes from freestyle string. The instance is passed to triggerExecutor method.

TriggerExecutor

triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object args)

Creates an istance of TriggerExecutor. See Trigger Executor Method

public class GenericBackupTriggerExample implements TriggerFunctionI {
  static String triggerName = "genericBackupTrigger";

  @Override
  public String getTriggerName() {
    return triggerName;
  }

  @Override
  public Object buildArguments(String arg) {
    return arg.split(",");
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    String[] args = (String[]) arg;
    return new GenericBackupTriggerExecutor(triggerUtils, newValues, oldValues, args[0], args[1]);
  }

  @Override
  public String description() {
    return "Generic backup Trigger example";
  }
}

3.1. Method triggerExecutor

The method triggerExecutor builds an instance of TriggerExecutor accordingly to the given params.

Param position

Param type

Description

TriggerUtils

Utils. See Trigger Utils

TypedNamedValuesI

New values for table’s row. Null on deletion

TypedNamedValuesI

Old values for table’s row. Null on insertion

String

Trigger argument

3.2. Custom triggerExecutor

ResultType

Function

Description

void

execute()

Executes the procedural code

In order to implement the trigger execution code, you might probably need to use the following final methods:

ResultType

Function

Description

boolean

hasValues()

Tell if the new values are available

boolean

hasOldValues()

Tell if the old values are available

ExpressionType

getType(String column)

Get the column’s type

Object

getValue(String column)

Get new value for column

Object

getOldValue(String column)

Get old value for column

void

setValue(String column, Object value)

Set new value for column

String[]

valuesFieldNames()

Get values field names

Note that TriggerExecutor abstract class constructor has the following parameters:

Param position

Param type

Description

TypedNamedValuesI

New values for table’s row

TypedNamedValuesI

Old values for table’s row

  static public class GenericBackupTriggerExecutor extends TriggerExecutor {

    public final String backupTableName;
    public final String tsColumnName;
    public final TriggerUtils triggerUtils;

    public GenericBackupTriggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, String backupTableName, String tsColumnName) {
      super(newValues, oldValues);
      this.backupTableName = backupTableName;
      this.tsColumnName = tsColumnName;
      this.triggerUtils = triggerUtils;
    }

    @Override
    public String getExecutorName() {
      return triggerName;
    }

    @Override
    public void execute() {
      if (!hasValues()) { // delete
        return;
      }
      try (Session session = triggerUtils.session()) {
        Table table = session.database().getTable(backupTableName);
        Tuple tuple = table.createTuple();
        for (String column : valuesFieldNames()) {
          tuple.put(column, getValue(column));
        }
        tuple.put(tsColumnName, Timestamp.valueOf(LocalDateTime.now()));
        table.insert(tuple);
        // no need to commit when sharing QE's transaction
      } catch (Exception exception) {
        throw new LeanxcaleRuntimeException(exception);
      }
    }
  }

4. Trigger Utils

ResultType

Function

Description

long

localTimestamp()

Returns the current date and time in the session time zone

long

currentTimestamp()

Returns the current date and time in the session adjusted time zone

Long

getTid()

Return current transaction

String

getDbName()

Return connection’s database

String

getSchemaName()

Return connection’s schema

String

uuid()

Generate a UUID

Connection

getConnection()

Get current JDBC connection

Session

session(Settings Settings)

Creates a new session to a Kivi database. It shares the current kivi connection.

Session

session()

Creates a new session to a Kivi database. It shares the current kivi connection and transaction to connect through the direct API.

long

currVal(String sequence)

Get current sequence value or null if it is not set

long

nextVal(String sequence)

Get next sequence value and set current sequence value

String

geohash(List<Object> geohashSourceValues)

Calculate geohash from values. If there is one element, it expects it to be a WKT (well known text). If there is 2, the first one is the latitude and the second is the longitude.

5. Expose a Trigger as a Table Function

public class BackupTriggerExample extends TriggerTableFunction {

  static String triggerName = "backupTrigger";
  static String[] columnNames = {"PK1", "PK2", "F3", "F4"};

  public static String getFunctionName() {
    return triggerName;
  }

  @Override
  public String getTriggerName() {
    return getFunctionName();
  }

  @Override
  public Object buildArguments(String s) {
    return s;
  }

  @Override
  public TriggerExecutor triggerExecutor(TriggerUtils triggerUtils, TypedNamedValuesI newValues, TypedNamedValuesI oldValues, Object arg) {
    return new BackupTriggerExecutor(newValues, oldValues);
  }

  @Override
  public String description() {
    return "Trigger example";
  }

  public ScannableTable eval(Long pk1, String pk2, String f3, Integer f4) {
    Object[] values = {pk1, pk2, f3, f4};
    int[] types = {Types.BIGINT, Types.VARCHAR, Types.VARCHAR, Types.INTEGER};
    NamedValuesI newValues = new TypedNamedValuesImpl(columnNames, values, types);
    return evalTrigger(newValues, null, buildArguments(null));
  }
}

 try (ResultSet rs = s.executeQuery("exec backupTrigger(1, 'a', 'af3', null)")) {
        assertThat(rs.next(), is(true));
        assertThat(rs.getLong(1), is(1L));
        assertThat(rs.getString(2), is("a"));
        assertThat(rs.getString(3), is("af3"));
        assertThat(rs.getInt(4), is(100));
        assertThat(rs.next(), is(false));
      }

5.1. Typed Named Values

The TypedNamedValuesI interface provides a interface to access a value by name. This is the following:

ResultType

Function

Description

ExpressionType

getType(String name)

Get the column’s type

Object

getValue(String name)

Get the value for name. Null when not found or value is null

void

setValue(String name, Object value)

Set the value for name

String[]

getFieldNames()

Get field names

Object[]

getValues()

Get values ordered as getFieldNames()

boolean

contains(String name)

Tells if name exists

int

size()

Tells size of the values

String

sqlTableName()

Table’s name the values belong to. Null when unknown

Anyway, we provide a basic TypedNamedValuesI implementation that might satisfy your needs.

  /**
   * Create a TypedNamesValues
   * @param positionsMap Map value position by name
   * @param row values
   * @param typeMap Map SQL type by name
   */
  public TypedNamedValuesImpl(Map<String, Integer> positionsMap, Object[] row, Map<String, ExpressionType> typeMap) {
    ...
  }

  /**
   * Create a NamedValues object
   * @param names values names
   * @param row values ordered by names
   * @param types  SQL types ordered by names
   */
  public TypedNamedValuesImpl(String[] names, Object[] row, int[] types) {
    ...
  }

5.2. Expression Type

The ExpressionType interface provides information about a table’s column or an expression. This is the following:

ResultType

Function

Description

int

sqlType()

Expression’s SQL JDBC type

int

precision()

Expression’s SQL JDBC type precision

int

scale()

Expression’s SQL JDBC type scale

boolean

isNullable()

Tells if the expression is nullable

6. System Triggers

Despite of custom triggers, we provide some system triggers that could be used directly by the user.

6.1. Auto Increment Trigger

Argument position

Description

Name of the column to be assigned with sequence’s next val when null

Sequence name. We suggest to include the schema name so the trigger won’t fail when connected to a different schema

The following table’s creation is equivalent to the one below:

create table tauto (i int AS IDENTITY start with 10 increment by 10, v VARCHAR, PRIMARY KEY (i));

create table tauto (i int, v VARCHAR, PRIMARY KEY (i));

create sequence tauto_i_seq start with 10 increment by 10;

CREATE TRIGGER tauto_i_seq_trigger BEFORE INSERT ON tauto
FOR EACH ROW EXECUTE autoIncrementTrigger('i,tauto_i_seq')
PRIORITY 100;

-- Note that the autoincrement colums will become an implicit column for inserts
insert into tauto values ('value');

6.2. Geohash Trigger

Argument position

Description

Name of the column to be assigned with the calculated geohash

WKT field if there is only two arguments, or Latitude field if there is three

Longitude field if there is three arguemnts

The following table’s creation is equivalent to the one below:

create table t (wkt VARCHAR, latitude double, longitude double, PRIMARY GEOHASH KEY (wkt), GEOHASH (latitude,longitude));

create table t (wkt VARCHAR, latitude double, longitude double, gh_wkt VARCHAR PRIMARY KEY, gh_lat_long VARCHAR);

CREATE TRIGGER gh_wkt_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_wkt,wkt')
PRIORITY 100;

CREATE TRIGGER gh_lat_long_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE AutoGeohashTrigger('gh_lat_long,latitude,longitude')
PRIORITY 100;

-- Note that GEOHASH clause will create and index too, so we create it here manually
CREATE INDEX gh_lat_long_index ON t (gh_lat_long);

-- Note that the geohash column will become an implicit column for inserts and selects
insert into t values (ST_asText(ST_MakePoint(3.41,40.25)),cast(40.25 as double), cast(3.41 as double));

Geohash fields are fields indexed geographically for GIS applications

6.3. UUID Trigger

The following table’s creation is equivalent to the one below:

create table tauto (v VARCHAR);

create table t (id VARCHAR, v VARCHAR, PRIMARY KEY (id));

CREATE TRIGGER uuid_trigger BEFORE INSERT ON t
FOR EACH ROW EXECUTE UUIDTrigger('id')
PRIORITY 100;

-- Note that the uuid column will become a implicit column for inserts and selects
insert into t values ('value');

6.4. Referential Integrity Triggers

> create table tpk (i int, v VARCHAR, i2 int, PRIMARY KEY (i));

> create table tfk (pk int, v VARCHAR, fk int, PRIMARY KEY (pk), CONSTRAINT fKi2 FOREIGN KEY (fk) REFERENCES tpk(i));

> ENABLE TRIGGER fKi2 ON tfk;

> ENABLE TRIGGER fKi2 ON tpk;

> select * from LXSYSMETA.TRIGGERS;
triggerName      FKI2
triggerFunction  DeleteForeignKeyTrigger
args             {"fkTableName":"APP.TFK","columns":["I"],"fkColumns":["FK"]}
isBefore         true
isInsert         false
isUpdate         false
isDelete         true
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TPK

triggerName      FKI2_NO_CONFLICT
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":false,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         false
tableCat         db
tableSchem       APP
tableName        TFK

triggerName      FKI2
triggerFunction  ForeignKeyTrigger
args             {"doCreateConflict":true,"refTableName":"APP.TPK","columns":["FK"],"refColumns":["I"],"parentTriggerName":"FKI2"}
isBefore         true
isInsert         true
isUpdate         true
isDelete         false
priority         1
isEnable         true
tableCat         db
tableSchem       APP
tableName        TFK

7. NOTICE

As stated, the work of the LeanXcale Query Engine forked from Apache Calcite. For the drivers, Apache Avatica framework is also used:

Apache Calcite
Copyright 2012-2019 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

This product is based on source code originally developed
by DynamoBI Corporation, LucidEra Inc., SQLstream Inc. and others
under the auspices of the Eigenbase Foundation
and released as the LucidDB project.

The web site includes files generated by Jekyll.

Apache Calcite -- Avatica
Copyright 2012-2019 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

Geohash coding is based on https://github.com/kungfoo/geohash-java with the following license file:

Copyright 2016 Silvio Heuberger and contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

LeanXcale system uses spread as communication bus under the following license:

Version 1.0
June 26, 2001

Copyright (c) 1993-2016 Spread Concepts LLC. All rights reserved.

This product uses software developed by Spread Concepts LLC for use in the Spread toolkit.
For more information about Spread, see http://www.spread.org