SQL Data Manipulation Language (DML) Reference

This page describes LeanXcale’s SQL DML.

The SQL syntax is described using an EBNF-like form.

1. Isolation Levels

Database transactions provide the so-called ACID properties. The I stands for isolation and it takes care of the consistency of the data in the advent of concurrent transactions updating the same row(s). The higher the consistency level the most consistency is attained, but it is more costly to execute transactions. The levels from less to more consistency are (short names between parenthesis):

  • Loader: Mode to load data without any other parallel activity. It deactivates conflict checking and logging, a transaction gets as start timestamp the current snapshot. It is highly efficient for loading.

  • Raw_read_committed: (rawrc) This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. When the transaction commits its updates become visible, its updates become visible as they reach the involved storage servers, so it would be possible for another transaction to see only part of the updates made by a transaction. No conflicts are checked (writes over the same row by parallel transactions, transactions overlapping in time are allowed).

  • Read_committed (rc): This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. However, it allows non-repeatable reads, as a transaction may see different values for the same record within the same transaction if other transactions commit changes. When the transaction starts it waits for the snapshot to reach the commit timestamp of the last transaction committed in the session, and then it gets the snapshot as start timestamp.

  • Snapshot_isolation. It guarantees repeatable reads and no phantom reads. This isolation levels is called serializability in other database systems (incorrectly). Only write skew anomalies might happen. In case write skew might happen, it can be avoided by using select for update. The trasaction when started waits for the snapshot to be equal or higher than the commit timestamp of the last committed transaction in the session. Then it gets the current snapshot as start timestamp.

The default isolation mode is read_committed. The isolation mode is specified in the connection string in the driver with the property mode using either the long or short name of the isolation mode. The isolation mode can be changed via the SQL statement SET ISOLATION LEVEL described in Section 5

It should be noted that leanxcale provides fully optimistic concurrency control based on MVCC, this means, that in the advent of concurrent transactions (such that the start of one is between the start and commit of another) updating the same row results in aborting the second transaction performing the update. Other database systems in addition to MVCC has also locking. In those databases with snapshot isolation and serializability concurrent transactions modying the same row will also result in abortin the second transaction performing the update. However, in read committed mode, they don’t abort any of the transactions, just delay the second one till the first one is committed or rolled back. When a transaction is aborted due to a conflict the following exception is raised: …​ ConflictManagerException …​

In case of rollback due to a conflict, the application should handle the exception and retry the transaction.

2. DML SQL Statements

The DML SQL statements supported are:

statement=
	queryStatement
	| upsertStatement
	| insertStatement
	| updateStatement
	| deleteStatement
	| mergeStatement
	| setIsolationLevelStatement
	| explainStatement
	| execStatement
	| batchStatement

Multiple SQL statements can be provided by separating them with semicolons:

statementList=
	statement; [statement';']*

3. Query Statement

A query statement can have the following forms:

queryStatement=
	( selectStatement
	| selectForUpdateStatement
	| unionStatement
	| intersectStatement
	| exceptStatement
	| intersectStatement
	| withStatement
	| valuesStatement )
		[ limitClause ]
		[ offsetClause ]
		[ fetchClause ]

3.1. SELECT Statement

The most common statement is the SELECT statement. The SELECT is accepted with FROM (the standard) and also without FROM to have compatibility with those SQL dialects that accept the SELECT without FROM:

selectStatement=
	( SELECT [ ALL | DISTINCT ]
		{ * | projectItem [, projectItem ]* }
	[ FROM tableExpression ]
	[ WHERE booleanExpression ]
	[ GROUP BY { groupItem [, groupItem ]* } ]
	[ HAVING booleanExpression ]
	[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]
	[ FOR UPDATE]
	[ ORDER BY item(, item)*]
	[ limitClause
	| OffsetClause [limitClause | FetchClause]
	| FetchClause ]

3.1.1. ALL/DISTINCT clause

Specifying the DISTINCT clause, filters out the same rows and returns only different (i.e., distinct) values in the resultset. The ALL clause returns all values, i.e., including repeated rows, it is the default behavior.

After the optional ALL/DISTINCT clause the list of columns to be projected is provided:

projectItem=
      expression [ [ AS ] columnAlias ]
  |   tableAlias . *

3.1.2. FROM Clause

In the optional FROM clause a tableExpression is provided that is basically the list of tables used in the select and how they should be joined (kind of join, join condition):

tableExpression=
	tableReference [, tableReference ]*
	| tableExpression [ NATURAL ]
	[ ( LEFT | RIGHT | FULL ) [ OUTER ] ]
		JOIN tableExpression [ joinCondition ]
	| tableExpression CROSS JOIN tableExpression

tableReference=
	tablePrimary
	[ matchRecognize ]
	[ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]

tablePrimary=
	[ [ catalogName . ] schemaName . ] tableName
		'(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'
	| [ LATERAL ] '(' queryStatement ')'
	| UNNEST '(' expression ')' [ WITH ORDINALITY ]
	| [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')'

joinCondition=
	ON booleanExpression
	| USING '(' column [, column ]* ')'

A SELECT without FROM is equivalent to VALUES (included for compatibility with PosgreSQL), as for example:

SELECT 1, CURRENT_TIMESTAMP;

This just yields the value 1 and the current timestamp in the database server.

3.1.3. WHERE Clause

The optional WHERE clause enables to provide a filter condition to select only the rows that fulfill the condition.

[ WHERE booleanExpression ]

3.1.4. GROUP BY and HAVING Clauses

The optional GROUP BY clause:

[ GROUP BY { groupItem [, groupItem ]* } ]
 	[ HAVING booleanExpression ]

enables to aggregate rows with the same values in a set of columns into summary rows.

groupItem=
	expression
	| '(' ')'
	| '(' expression [, expression ]* ')'
	| CUBE '(' expression [, expression ]* ')'
	| ROLLUP '(' expression [, expression ]* ')'
	| GROUPING SETS '(' groupItem [, groupItem ]* ')'

The optional HAVING clause the HAVING clause is used in conjunction with the GROUP BY clause to filter the results of a query based on aggregate functions applied to grouped data. The reason is because the WHERE keyword cannot be used with aggregate functions.

In the SELECT, HAVING and ORDER BY clauses of an aggregate query, all expressions must be constant within the current group (that is, grouping constants as defined by the GROUP BY clause, or constants), or aggregate functions, or a combination of constants and aggregate functions. Aggregate and grouping functions may only appear in an aggregate query, and only in a SELECT, HAVING or ORDER BY clause.

3.1.5. WINDOW Clause

The optional WINDOW clause:

[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]

is used in conjunction with window functions to define a window of rows over which the window function will operate. Window functions perform a calculation across a set of table rows related to the current row. The WINDOW clause allows you to specify the window frame or partition over which the window function operates:

window=
	windowName
	| windowSpec

windowSpec=
	'('
	[ windowName ]
	[ ORDER BY orderItem [, orderItem ]* ]
	[ PARTITION BY expression [, expression ]* ]
	[ RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
		| ROWS numericExpression { PRECEDING | FOLLOWING }
	]
	')'

3.1.6. ORDER BY Clause

The ORDER BY clause allows to sort the resultset by a set of columns and/or expressions:

orderByClause=
	ORDER BY expression [DESC | ASC] [NULLS (FIRST | LAST)] [, expression [DESC | ASC]]*

For each expression (it can be just a column name) one can optionally specify if the order is ascending (ASC) or descending (DESC). One can also optionally specify whether NULLS should appear before other values (NULLS FIRST) or after other values (NULLS LAST). By default, ordering is ascending and nulls appears at the end (i.e., ASC NULLS LAST for all expressions).

3.1.7. LIMIT Clause

The optional LIMIT clause is used to restrict the number of rows returned by a query. Additionally, an additional OFFSET clause can be used which allows to skip a certain number of rows before starting to return rows. The syntax is:

limitClause=
		[ LIMIT (count | ALL) ]

3.1.8. OFFSET Clause

The optional OFFSET clause:

offsetClause =
	[ OFFSET start ( ROW | ROWS ) ]

is used in conjunction with the LIMIT clause to specify the starting point within the result set for fetching rows. It allows you to skip a certain number of rows before beginning to return rows.

3.1.9. FETCH Clause

The optional FETCH clause:

fetchClause =
	[ FETCH ( FIRST | NEXT ) [ count ] ( ROW | ROWS ) ONLY ]

is used to retrieve a specified number of rows from the result set produced by a query. It is often used in conjunction with the OFFSET clause for pagination purposes, where you want to retrieve a specific subset of rows from a larger result set. The FIRST keyword indicates that you want to retrieve the first set of rows from the result set. The NEXT keyword indicates that you want to retrieve the next set of rows from the result set. The ONLY keyword is used in conjunction with the FETCH clause to specify that only the specified number of rows should be returned, and no additional rows should be fetched beyond that limit.

3.1.10. FOR UPDATE Clause

Snapshot isolation isolation level can suffer from write-skew anomaly in some scenarios, when an integrity constraint should be preserved among 2 or more rows, updating only a subset of them. An example of write skew is an operation that can withdraw funds from any two accounts for the same customer as far as the joint balance is positive (i.e. the sum bot both account balances is greater or equal than zero).

Assume that initially two accounts from the same customer with customerid 27 has a joint balance of 10:

+------------+---------+------------+
| ACCOUNTID | BALANCE | CUSTOMERID |
+------------+---------+------------+
| 1          | 3       | 27         |
| 2          | 7       | 27         |
+------------+---------+------------+

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

We have 2 concurrent transactions with isolation level set to snapsthot isolation and with AUTOCOMMIT set to OFF. The first transaction wants to withdraw from accountid 1 an amount of 5. It checks the current joint balance:

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

Since 5 ⇐ 10 it will proceed to subtract 5 from accountid 1. However, before the transaction commits, a second transaction wants to withdraw from accountid 2 an amount of 6. So it checks the current joint balance:

SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1;
+--------+
| EXPR$0 |
+--------+
| 10     |
+--------+

Since 6 ⇐ 10 it will proceed to subtract 6 from accountid 2.

Now the first transaction updates accountid 1 with an amount 3-5=-2 and commits:

UPDATE your_table_name
SET BALANCE = -2
WHERE ACCOUNTID = 1;
COMMIT;

Now the second transaction updates accountid 2 with an amount 5-6=-1 and commits:

UPDATE your_table_name
SET BALANCE = -1
WHERE ACCOUNTID = 2;
COMMIT;

The final result after committing both transactions is:

+------------+---------+------------+
| ACCOUNTID  | BALANCE | CUSTOMERID |
+------------+---------+------------+
| 1          | -1      | 27         |
| 2          | -2      | 27         |
+------------+---------+------------+

That violates the integrity constraint of keeping a non negative joint balance. The reason is that both transactions observe the same snapshot.

By using SELECT FOR UPDATE in both transactions, the first transaction doing the update over the pair of involved rows will succeed, while the second one will be aborted because the UPDATE of the first transaction over accountid 1 will conflict with the SELECT FOR UPDATE performed by the second transaction over accountid 1, and thus, resulting in its abortion.

SELECT FOR UPDATE has the same syntax as SELECT except using the clause FOR UPDATE at the end of the SELECT.

3.2. UNION, INTERSECT, and EXCEPT Statements

The union, intersect, and except statements can combine resultsets in different ways performing the set operations union, intersect and difference, respectively, and have the following syntax:

unionStatement=
 	queryStatement UNION [ ALL | DISTINCT ] queryStatement
intersectStatement=
	queryStatement INTERSECT [ ALL | DISTINCT ] queryStatement
exceptStatement=
	queryStatement EXCEPT [ ALL | DISTINCT ] queryStatement

MINUS is not supported but EXCEPT is equivalent to MINUS.

3.3. WITH Statement

The second form of a query has a with clause that has the following syntax:

withStatement=
	WITH withItem [, withItem]* queryStatement
withItem=
	name [ '(' column [, column ]* ')' ]
		AS '(' queryStatement ')'....

3.4. VALUES Statement

The VALUES statement basically defines a table literal with one or more rows, that is, it is a table value constructor. Its syntax is:

values=
	VALUES rowExpression [, rowExpression ]*

Where a rowExpression is specified as follows:

rowExpression=
	expression
	| [ROW] '(' expression [, expression]* ')'

3.5. Subqueries

A scalar sub-query is a sub-query used as an expression. If the sub-query returns no rows, the value is NULL; if it returns more than one row, it is an error. IN, EXISTS and scalar sub-queries can occur in any place where an expression can occur (such as the SELECT clause, WHERE clause, ON clause of a JOIN, or as an argument to an aggregate function).

An IN, EXISTS, or scalar sub-query may be correlated, that is, it may refer to tables in the FROM clause of an enclosing query.

4. UPSERT, INSERT, and UPDATE Statements

For modifying the database there are four kinds of statements allowing to upsert, insert, update, and delete rows from the database. Upsert updates or insert depending whether a row exists with the provided primary key or not. Insert adds a new row with that primary key, if there is no row with that primary key already in the table (otherwise it returns an error). Update modifies a row with a particular primary key, if there is a row in the table with that primary (otherwise it returns an error). Upsert is preferred over insert and update since it is the most efficient. Insert and update both require to perform an additional read to check whether the row exists or not, basically duplicating the cost of the statement. Their syntax is as follows:

upsertStatement=
	UPSERT INTO tablePrimary
		[ '(' column [, column ]* ')' ]
			queryStatement

insertStatement=
	INSERT INTO tablePrimary
		[ '(' column [, column ]* ')' ]
 			queryStatement

updateStatement=
 	UPDATE tablePrimary
 		SET assign [, assign ]*
 			[ WHERE booleanExpression ]

deleteStatement=
	DELETE FROM tablePrimary [ [ AS ] alias ]
		[ WHERE booleanExpression ]

In INSERT and UPSERT statements if the list of target columns is not provided, then the statement must have the same number of columns as the target table and the order of the fields in the VALUES clause must be the same as the order of the fields in the table.

5. SET ISOLATION LEVEL Statement

The current isolation level can be changed within the transaction using the SET ISOLATION LEVEL statement that has the following syntax:

setIsolationLevelStatement=
	SET ISOLATION LEVEL [TRANSACTION]
		LOADER | NONE
		| RAW READ COMMITTED | RRC | READ UNCOMMITTED
		| READ COMMITTED | RC
		| SNAPSHOT ISOLATION | SI | SERIALIZABLE | REPEATABLE READ

LeanXcale has multiversion concurrency control that provides highly efficient isolation management between reads and writes. The isolation levels were defined multiversion concurrency control was invented and are based on the notion of locking. LeanXcale supports 4 isolation levels:

  • LOADER (same as NONE). This mode is similar to the NONE standard mode in that it does not have neither concurrency control nor logging. It should be used for massive loads when nobody else is using the database, for instance, a night batch ingesting massive data.

  • RAW READ COMMITTED (same as RRC or READ UNCOMMITTED). This mode is a little more relaxed than READ COMMITTED and allows to read the most current version of any row. It is interesting for applications where one does not want to read from a consistent snapshot in the commit order, but instead the most recent value of the row. When setting READ UNCOMMITTED, this is the mode used that provides more isolation.

  • READ COMMITTED (same as RC). For each SQL statement it uses the most recent consistent snapshot (i.e., longest prefix of committed transactions in commit timestamp order). The equivalent in multiversion concurrency control of locking read committed.

  • SNAPSHOT ISOLATION (same as SI, SERIALIZABLE, REPEATABLE READ, SESSION, SESSION CONSISTENCY). The transaction at start time it gets the current snapshot and performs all reads within the transaction from this snapshot plus the updates performed by the transaction itself. It is roughly similar to SERIALIZABLE.

Note that the optional [TRANSACTION] keyword is just syntactic sugar for compatibility with other SQL dialects in which is mandatory to use it.

6. EXPLAIN Statement

The EXPLAIN statement information about the execution plan of a SQL query. The execution plan describes how the database engine will execute the query and retrieve the required data. This information can be helpful for database administrators, developers, and analysts to optimize and tune their queries for better performance:

explainStatement=
	EXPLAIN PLAN
		[ WITH TYPE | WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ]
		[ EXCLUDING ATTRIBUTES | INCLUDING [ ALL ] ATTRIBUTES ]
		[ AS JSON | AS XML ]
			FOR ( queryStatement | insertStatement | updateStatement | deleteStatement )

7. SQL Hints

A hint is an instruction to the optimizer. When writing SQL, you may know information about the data unknown to the optimizer. Hints enable you to force decisions otherwise made by the optimizer.

  • Planner enforcers: there’s no perfect planner, so it makes sense to implement hints to allow user better control the execution. For instance: “never merge this subquery with others” (/+ no_merge */). “treat those tables as leading ones” (/+ leading */) to affect join ordering, etc;

  • Append meta data/statistics: some statistics like “table index for scan” or “skew info of some shuffle keys” are somehow dynamic for the query, it would be very convenient to config them with hints because the planning metadata from the planner might not be sufficiently accurate.

  • Operator resource constraints: for many cases, a default resource configuration is assigned for the execution operators, i.e., min parallelism, memory (resource consuming UDF), special resource requirement (GPU or SSD disk), …​

LeanXcale supports two kinds of hints:

  • Query Hint: right after the SELECT keyword;

  • Table Hint: right after the referenced table name.

This is the syntax:

query=
	SELECT /*+ hints */
		...
	FROM
		tableName /*+ hints */
	JOIN
		tableName /*+ hints */
		...

hints=
	hintItem[, hintItem ]*

hintItem=
	hintName
	| hintName(optionKey=optionVal[, optionKey=optionVal ]*)
	| hintName(hintOption [, hintOption ]*)

optionKey=
	simpleIdentifier
	| stringLiteral

optionVal=
	stringLiteral

hintOption=
	simpleIdentifier
	| numericLiteral
	| stringLiteral

8. EXEC Statement

The EXEC statement enables to invoke a table function. Its syntax is:

execStatement=
	EXEC '(' [ nonQueryExpression [, nonQueryExpression]* ] ')'

The parameters can be any expression that is not a query returning a value of the parameter type.

9. BATCH Statement

When auto-commit is set, and requesting an insert/update/delete operation from the result of a select statement with a large resultset, one might one instead of performing a large commit of the whole set of updates, a commit for smaller batches of updates. This can be performed with the BATCH statement that changes the implicit behavior of auto-commit. Instead of committing one time at the end of the statement, it will commit every number of indicate updates to commit every number of times indicated in the parameter of the BATCH statement:

batchStatement=
	BATCH unsignedIntLiteral   |   BATCH unsignedIntLiteral (sqlInsert | sqlUpdate | sqlDelete)

10. Identifiers

Identifiers are the names of tables, columns and other metadata elements used in an SQL query. Identifiers can be quoted or unquoted. Quoted identifiers, such as "Employee Name", start and end with double quotes. Unquoted identifiers, such as employee, must start with a letter and can only contain letters, digits, and underscores. They are implicitly converted to upper case. Identifiers are case insensitive when unquoted (actually, it is as they were written in CAPS), and case sensitive when quoted. Quoted identifiers may contain virtually any character, including spaces and other punctuation. If you wish to include a double quote in an identifier, use another double quote to escape it, as in this example: "An employee called ""Fred"".". Matching identifiers to the name of the referenced object is case-sensitive.

11. Data Types

11.1. Scalar Data Types

Data type Description Range and example literals Can be stored

BOOLEAN

Logical values

Values: TRUE, FALSE, UNKNOWN

Yes

SMALLINT

2 byte signed integer

Range is -32768 to 32767

Yes

INTEGER, INT

4 byte signed integer

Range is -2147483648 to 2147483647

Yes

BIGINT

8 byte signed integer

Range is -9223372036854775808 to 9223372036854775807

Yes

REAL, FLOAT

4 byte floating point

6 decimal digits precision

Yes

DOUBLE

8 byte floating point

15 decimal digits precision

Yes

DECIMAL(p, s)

Fixed point

Example: 123.45 is a DECIMAL(5, 2) value.

Yes

NUMERIC

Fixed point

Yes

CHAR(n), CHARACTER(n)

Fixed-width character string

‘Hello’, ‘’ (empty string), _latin1’Hello’, n’Hello’, _UTF16’Hello’, ‘Hello’ ‘there’ (literal split into multiple parts)

Yes

VARCHAR(n), CHARACTER VARYING(n)

Variable-length character string

As CHAR(n)

Yes

BINARY(n)

Fixed-width binary string

x’45F0AB’, x’’ (empty binary string), x’AB’ ‘CD’ (multi-part binary string literal)

Yes

VARBINARY(n), BINARY VARYING(n)

Variable-length binary string

As BINARY(n)

Yes

DATE

Date

Example: DATE ‘1969-07-20’

Yes

TIME

Time of day

Example: TIME ‘20:17:40’

Yes. This datatype has milliseconds precision

TIMESTAMP

Date and time

Example: TIMESTAMP ‘1969-07-20 20:17:40’

Yes. This datatype has microseconds precision

INTERVAL timeUnit [ TO timeUnit ]

Date time interval

Examples: INTERVAL ‘1-5’ YEAR TO MONTH, INTERVAL ‘45’ DAY, INTERVAL ‘1 2:34:56.789’ DAY TO SECOND

No

TIMESTAMP WITH TIME ZONE

Date and time with time zone

Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’

No

TIMESTAMP WITH LOCAL TIME ZONE

Date and time with time zone

Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’

No

GEOMETRY

Geometry

Examples: ST_GeomFromText(‘POINT (30 10)’)

No

BIGINT ARRAY

Array of longs

[1, 2, 3]

Yes

DOUBLE ARRAY

Array of doubles

[1.1, 2.2, 3.3]

Yes

VARCHAR ARRAY

Array of strings

['string1', 'string2']

Yes

11.2. Non Scalar Types

LeanXcale currently supports the following non scalar types:

Type Description Example literals Can be stored

ARRAY

Ordered, contiguous collection that may contain duplicates

Example: varchar(10) array

Yes

MULTISET

Unordered collection that may contain duplicates

Example: int multiset

No

MAP

Collection of keys mapped to values

No

CURSOR

Cursor over the result of executing a query

No

11.2.1. Array Data Type

A sentence to create a table with array types could be the following:

CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR ARRAY, PRIMARY KEY(id));

Or the following, if you want to restrict the number of characters of each VARCHAR position in the VARCHAR ARRAY.

CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR (20) ARRAY, PRIMARY KEY(id));

An array literal has the following syntax:

arrayLiteral=
	ARRAY '[' value [, value]* ']'

If we query the above sample table TARRY, the result will be shown this way:

SELECT * FROM tarry
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| ID |                     LONGS                      |                    DOUBLES                     |         STRINGS         |
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| 1  | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] |
+----+------------------------------------------------+------------------------------------------------+-------------------------+

In order to insert a NULL value to an array field, can be done in two ways:

  • Replace the NULL keyword for the emptyList() function.

INSERT INTO tarry VALUES (2, ARRAY[2334589765239847563, 0, 44], emptyList(), ARRAY['CUENCA','BILBAO']);
  • Specify only the columns in which we want to insert values. In unlisted columns, NULL values will be inserted.

INSERT INTO tarry (id, longs, strings) VALUES (2, array[2334589765239847563, 0, 44], ARRAY['CUENCA','BILBAO']);

The result will be:

+----+------------------------------------------------+------------------------------------------------+-------------------------+
| ID |                     LONGS                      |                    DOUBLES                     |         STRINGS         |
+----+------------------------------------------------+------------------------------------------------+-------------------------+
| 1  | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] |
| 2  | [2334589765239847563, 0, 44]                   |                                                | [CUENCA, BILBAO]        |
+----+------------------------------------------------+------------------------------------------------+-------------------------+

11.2.2. Non Scalar Value Constructors

The next table summarizes the constructors for non scalar values:

Operator syntax Description

ROW (value [, value ]*)

Creates a row from a list of values.

(value [, value ]* )

Creates a row from a list of values.

MAP ‘[’ key ‘]’

Returns the element of a map with a particular key.

ARRAY ‘[’ index ‘]’

Returns the element at a particular location in an array.

ARRAY ‘[’ value [, value ]* ‘]’

Creates an array from a list of values.

MAP ‘[’ key, value [, key, value ]* ‘]’

Creates a map from a list of key-value pairs.

11.2.3. Non Scalar Functions

The supported functions for non scalar types are shown in the below table:

Operator syntax Description

ELEMENT(value)

Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element.

CARDINALITY(value)

Returns the number of elements in an array or multiset.

value MEMBER OF multiset

Returns whether the value is a member of multiset.

multiset IS A SET

Whether multiset is a set (has no duplicates).

multiset IS NOT A SET

Whether multiset is not a set (has duplicates).

multiset IS EMPTY

Whether multiset contains zero elements.

multiset IS NOT EMPTY

Whether multiset contains one or more elements.

multiset SUBMULTISET OF multiset2

Whether multiset is a submultiset of multiset2.

multiset NOT SUBMULTISET OF multiset2

Whether multiset is not a submultiset of multiset2.

multiset MULTISET UNION [ ALL | DISTINCT ] multiset2

Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET INTERSECT [ ALL | DISTINCT ] multiset2

Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET EXCEPT [ ALL | DISTINCT ] multiset2

Returns the difference of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

Note: the UNNEST relational operator converts a collection to a relation.

12. Type Conversion

LeanXcale supports both implicit and explicit conversion in expressions.

12.1. Implicit and Explicit Type Conversion

It is recommended to specify explicit conversions, rather than implicit conversions, for these reasons:

  • SQL statements are easier to understand when you use explicit datatype conversion functions.

  • Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.

  • Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR value may return an unexpected format.

12.2. Explicit Type Conversion

Explicit type conversion is attained with the CAST function:

Operator syntax Description

CAST(value AS type)

Converts a value to type.

The supported types in the CAST function are:

type=
	typeName
		[ collectionsTypeName ]*

typeName=
	sqlTypeName
	| rowTypeName
 	| compoundIdentifier

sqlTypeName=
	CHAR [ precision ] [ charSet ]
	| VARCHAR [ precision ] [ charSet ]
	| DATE
	| TIME
	| TIMESTAMP
	| GEOMETRY
	| DECIMAL [ precision [, scale] ]
	| BOOLEAN
	| TINYINT
	| SMALLINT
	| INTEGER
	| BIGINT
	| BINARY [ precision ]
	| VARBINARY [ precision ]
	| REAL
	| DOUBLE
	| FLOAT
	| ANY [ precision [, scale] ]

collectionsTypeName=
	ARRAY | MULTISET

rowTypeName=
	ROW '('
		fieldName1 fieldType1 [ NULL | NOT NULL ]
		[ , fieldName2 fieldType2 [ NULL | NOT NULL ] ]*
	')'

char=
	CHARACTER | CHAR

varchar=
	char VARYING | VARCHAR

decimal=
	DECIMAL | DEC | NUMERIC

integer:
	INTEGER | INT

varbinary=
	BINARY VARYING | VARBINARY

double=
	DOUBLE [ PRECISION ]

time=
	TIME [ precision ] [ timeZone ]

timestamp=
	TIMESTAMP [ precision ] [ timeZone ]

charSet=
	CHARACTER SET charSetName

timeZone=
	WITHOUT TIME ZONE
	| WITH LOCAL TIME ZONE

12.3. Implicit Type Conversion

refers to the automatic conversion of one data type to another by the database system without requiring explicit instructions from the programmer. This conversion occurs during the execution of SQL statements or expressions. The following table shows all possible conversions, without regard to the context in which it is made. The following convention is used:

  • i: implicit cast

  • e: explicit cast

  • x: not allowed

FROM - TO NULL BOOLEAN TINYINT SMALLINT INT BIGINT DECIMAL FLOAT or REAL DOUBLE INTERVAL DATE TIME TIMESTAMP CHAR or VARCHAR BINARY or VARBINARY

NULL

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

BOOLEAN

x

i

e

e

e

e

e

e

e

x

x

x

x

i

x

TINYINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

SMALLINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

INT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

BIGINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

DECIMAL

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

FLOAT/REAL

x

e

i

i

i

i

i

i

i

x

x

x

e

i

x

DOUBLE

x

e

i

i

i

i

i

i

i

x

x

x

e

i

x

INTERVAL

x

x

e

e

e

e

e

x

x

i

x

x

x

e

x

DATE

x

x

x

x

x

x

x

x

x

x

i

x

i

i

x

TIME

x

x

x

x

x

x

x

x

x

x

x

i

e

i

x

TIMESTAMP

x

x

e

e

e

e

e

e

e

x

i

e

i

i

x

CHAR or VARCHAR

x

e

i

i

i

i

i

i

i

i

i

i

i

i

i

BINARY or VARBINARY

x

x

x

x

x

x

x

x

x

x

e

e

e

i

i

12.3.1. Conversion Contexts and Strategies

  • Set operation (UNION, EXCEPT, INTERSECT): Analyze the data types of each row in every branch and determine the common type for each pair of fields

  • Binary arithmetic expression (+, -, &, ^, /, %): promote string operand to data type of the other numeric operand;

  • Binary comparison (=, <, , <>, >, >=): if operands are STRING and TIMESTAMP, promote to TIMESTAMP; make 1 = true and 0 = false always evaluate to TRUE. If there is numeric type operand, find common type for both operands.

  • IN sub-query: compare type of left hand side and right hand side, and find the common type. If it is a non scalar type, find the widest type for every field.

  • IN expression list: compare every expression to find the common type.

  • CASE WHEN expression or COALESCE: find the common widest type of the THEN and ELSE operands.

  • Character + INTERVAL or character - INTERVAL: promote character to TIMESTAMP.

  • Built-in function: look up the type families registered in the checker, find the family default type if checker rules allow it.

  • User-defined function (UDF): coerce based on the declared argument types of the eval() method.

  • INSERT and UPDATE: coerce a source field to counterpart target table field’s type if the two fields differ with type name or precision(scale).

Note:

Implicit type coercion of the following cases is ignored:

  • One of the type is ANY;

  • Type coercion within CHARACTER types are always ignored, i.e. from CHAR(20) to VARCHAR(30);

  • Type coercion from a numeric to another with higher precedence is ignored, i.e. from INT to LONG.

12.3.2. Strategies for Finding Common Type

  • If the operator has the expected data types, just take them as the desired one, e.g., the UDF would have eval() method which has reflection argument types.

  • If there is no expected data type but the data type families are registered, try to coerce the arguments to the family’s default data type, i.e. the String family will have a VARCHAR type.

  • If neither expected data type nor families are specified, try to find the tightest common type of the node types, i.e., INTEGER and DOUBLE will return DOUBLE, the numeric precision does not lose for this case.

  • If no tightest common type is found, try to find a wider type, i.e. VARCHAR and INTEGER will return INTEGER, we allow some precision loss when widening decimal to fractional, or promote to VARCHAR type.

13. Operators and Functions

The following sections provides a summary of all operators and functions supported by LeanXcale.

13.1. Comparison Operators

The following table provides the list of supported comparison operators:

Operator syntax Description

value1 = value2

Equals

value1 <> value2

Not equal

value1 > value2

Greater than

value1 >= value2

Greater than or equal

value1 < value2

Less than

value1 <= value2

Less than or equal

value IS NULL

Whether value is null

value IS NOT NULL

Whether value is not null

value1 IS DISTINCT FROM value2

Whether two values are not equal, treating null values as the same

value1 IS NOT DISTINCT FROM value2

Whether two values are equal, treating null values as the same

value1 BETWEEN value2 AND value3

(value1 >= value2) AND (value1 <= value3)

value1 NOT BETWEEN value2 AND value3

Whether value1 is less than value2 or greater than value3

string1 LIKE patternString [ ESCAPE string3 ]

Whether string1 matches pattern patternString

string1 NOT LIKE patternString [ ESCAPE string3 ]

Whether string1 does not match pattern patternString

string1 SIMILAR TO patternString [ ESCAPE string3 ]

Whether string1 matches regular expression patternString

string1 NOT SIMILAR TO patternString [ ESCAPE string3 ]

Whether string1 does not match regular expression patternString

value IN (value [, value]*)

Whether value is equal to a value in a list

value NOT IN (value [, value]*)

Whether value is not equal to any value in a list

value IN (sub-query)

Whether value is equal to a row returned by sub-query

value NOT IN (sub-query)

Whether value is not equal to any row returned by sub-query

value comparison SOME (sub-query)

Whether value fulfills the comparison for at least one row returned by sub-query

value comparison ANY (sub-query)

Synonym for SOME

value comparison ALL (sub-query)

Whether value fulfills the comparison for every row returned by sub-query

EXISTS (sub-query)

Whether sub-query returns at least one row

13.2. Regular Expressions

LIKE and SIMILAR TO operators allow to perform pattern matching using regular expressions. You can use the special characters ('%','_') and you can also use regular expressions with a syntax similar to perl-like regular expressions. Specifically, regular expressions may contain.

  • char or char sequence: A single character or a sequence of characters that matches itself

  • '.': The dot is a wildcard that matches any single character except for a newline

  • '^': Matches the start of a line

  • '[a-z]', Represents a character class. In this example, it matches any lowercase letter from 'a' to 'z'

  • '[^a-z]': Represents a negated character class. It matches any character that is not a lowercase letter from 'a' to 'z'

  • '\a': Represents a predefined character class for alphanumeric characters. It includes letters (upper and lower case) and digits

  • '\u': Represents a predefined character class for uppercase letters

  • '\l': Represents a predefined character class for lowercase letters

  • '\b': Represents a predefined character class for whitespace characters, such as space and tab

  • '\w': Represents a predefined character class for word characters. It includes letters, digits, and underscore

They may also contain the following operators (from lowest to highest precedence):

  • '(' and ')': for grouping

  • '|': to provide alternatives

  • concatenation: implicit, no operator

  • '*': zero or more times

  • '+': one or more times

  • '?': zero or one time

It should be noted that the main difference between LIKE and SIMILAR TO is that LIKE implies the expression will start with the text in the expression while SIMILAR doesn’t imply '^'.

13.3. Logical Operators

The table below outlines the logical operators:

Operator syntax Description

boolean1 OR boolean2

Whether boolean1 is TRUE or boolean2 is TRUE

boolean1 AND boolean2

Whether boolean1 and boolean2 are both TRUE

NOT boolean

Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN

boolean IS FALSE

Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN

boolean IS NOT FALSE

Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN

boolean IS TRUE

Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN

boolean IS NOT TRUE

Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN

boolean IS UNKNOWN

Whether boolean is UNKNOWN

boolean IS NOT UNKNOWN

Whether boolean is not UNKNOWN

13.4. Mathematical Operators and Functions

This table provides an overview of arithmetic operators and functions:

Operator syntax Description

+ numeric

Returns numeric

- numeric

Returns negative numeric

numeric1 + numeric2

Returns numeric1 plus numeric2

numeric1 - numeric2

Returns numeric1 minus numeric2

numeric1 * numeric2

Returns numeric1 multiplied by numeric2

numeric1 / numeric2

Returns numeric1 divided by numeric2

POWER(numeric1, numeric2)

Returns numeric1 raised to the power of numeric2

ABS(numeric)

Returns the absolute value of numeric

MOD(numeric1, numeric2)

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

SQRT(numeric)

Returns the square root of numeric

LN(numeric)

Returns the natural logarithm (base e) of numeric

LOG2(numeric)

Returns the base 2 logarithm of numeric

LOG10(numeric)

Returns the base 10 logarithm of numeric

LOG(numeric1 [, numeric2])

Returns the base numeric2 (default is 10) logarithm of numeric1

EXP(numeric)

Returns e raised to the power of numeric

CEIL(numeric)

Rounds numeric up, returning the smallest integer that is greater than or equal to numeric

FLOOR(numeric)

Rounds numeric down, returning the largest integer that is less than or equal to numeric

RANDOM()

Generates a random double between 0 and 1 inclusive

RANDOM_INT(numeric)

Generates a random integer between 0 and numeric-1 inclusive

RANDOM_SEED(bigint)

Seeds the random generator; valid within the current session

ACOS(numeric)

Returns the arc cosine of numeric

ACOSD(numeric)

Returns the arc cosine of numeric in degrees

ACOSH(numeric)

Returns the hyperbolic arc cosine of numeric

ASIN(numeric)

Returns the arc sine of numeric

ASIND(numeric)

Returns the arc sine of numeric in degrees

ASINH(numeric)

Returns the hyperbolic arc sine of numeric

ATAN(numeric)

Returns the arc tangent of numeric

ATAND(numeric)

Returns the arc tangent of numeric in degrees

ATANH(numeric)

Returns the hyperbolic arc tangent of numeric

ATAN2(numeric, numeric)

Returns the arc tangent of the numeric coordinates

ATAN2D(numeric, numeric)

Returns the arc tangent of the numeric coordinates in degrees

CBRT(numeric)

Returns the cube root of numeric

COS(numeric)

Returns the cosine of numeric

COSD(numeric)

Returns the cosine of numeric; argument in degrees

COSH(numeric)

Returns the hyperbolic cosine of numeric

COT(numeric)

Returns the cotangent of numeric

COTD(numeric)

Returns the cotangent of numeric; argument in degrees

DEGREES(numeric)

Converts numeric from radians to degrees

E()

Returns e

FACTORIAL(bigint)

Returns the factorial of bigint

GCD(bigint1, bigint2)

Returns the greatest common divisor of bigint1 and bigint2

LCM(bigint1, bigint2)

Returns the least common multiple of bigint1 and bigint2

PI()

Returns a value that is closer than any other value to pi

RADIANS(numeric)

Converts numeric from degrees to radians

ROUND(numeric1 [, numeric2])

Rounds numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

SIGN(numeric)

Returns the signum of numeric

SIN(numeric)

Returns the sine of numeric

SIND(numeric)

Returns the sine of numeric; argument in degrees

SINH(numeric)

Returns the hyperbolic sine of numeric

TAN(numeric)

Returns the tangent of numeric

TAND(numeric)

Returns the tangent of numeric; argument in degrees

TANH(numeric)

Returns the hyperbolic tangent of numeric

TRUNCATE(numeric1 [, numeric2])

Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

13.5. String Operators and Functions

A summary of string operators and functions is depicted in the below table:

Operator syntax Description

string || string

Concatenates two character strings

CONCAT_WS(string0, string1, string2 [, …​])

Concatenates two or more character strings using string0 as delimiter

CHAR_LENGTH(string)

Returns the number of characters in a character string

CHARACTER_LENGTH(string)

As CHAR_LENGTH(string)

UPPER(string)

Returns a character string converted to upper case

LOWER(string)

Returns a character string converted to lower case

POSITION(string1 IN string2)

Returns the position of the first occurrence of string1 in string2

POSITION(string1 IN string2 FROM integer)

Returns the position of the first occurrence of string1 in string2 starting at a given point (not standard SQL)

TRIM( { BOTH | LEADING | TRAILING } string1 FROM string2)

Removes the longest string containing only the characters in string1 from the start/end/both ends of string1

OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])

Replaces a substring of string1 with string2

SUBSTRING(string FROM integer)

Returns a substring of a character string starting at a given point

SUBSTRING(string FROM integer FOR integer)

Returns a substring of a character string starting at a given point with a given length

INITCAP(string)

Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

LEFT(string, integer)

Returns the first integer characters of string

RIGHT(string, integer)

Returns the last integer characters of string

LPAD(string1, integer [, string2])

Fill up string1 to length integer by prepending the characters string2 (a space by default)

RPAD(string1, integer [, string2])

Fill up string1 to length integer by appending the characters string2 (a space by default)

REVERSE(string)

Returns string with characters in reverse order

REPEAT(string, integer)

Returns string repeated integer number of times

REPLACE(string1, string2, string3)

Replace all occurrences in string1 of substring string2 with substring string3

SPLIT(string1, string2)

Split string1 using string2 as delimiter and return the resulting array of strings

SPLIT_PART(string1, string2, integer)

Split string1 using string2 as delimiter and return the element with index integer (counting from one)

STARTS_WITH(string1, string2 [, boolean])

Checks whether string1 starts with string2 as prefix; boolean sets case sensitivity (default is true)

ENDS_WITH(string1, string2 [, boolean])

Checks whether string1 ends with string2 as suffix; boolean sets case sensitivity (default is true)

CONTAINS_SUBSTR(string1, string2 [, boolean])

Checks whether string1 contains string2; boolean sets case sensitivity (default is true)

REGEXP_LIKE(string1, string2)

Checks whether string1 matches the regular expression string2

REGEXP_REPLACE(string1, string2, string3)

Replaces in string1 each substring that matches the regular expression string2 with string3

13.6. Binary String Operators and Functions

The following table provides a summary of binary string operators and functions:

Operator syntax Description

binary || binary

Concatenates two binary strings

POSITION(binary1 IN binary2)

Returns the position of the first occurrence of binary1 in binary2

POSITION(binary1 IN binary2 FROM integer)

Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL)

OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ])

Replaces a substring of binary1 with binary2

SUBSTRING(binary FROM integer)

Returns a substring of binary starting at a given point

SUBSTRING(binary FROM integer FOR integer)

Returns a substring of binary starting at a given point with a given length

13.7. Operator Precedence

The following table summarizes the operator precedence and associativity, from highest to lowest.

Operator Associativity

.

left

::

left

[ ] (array element)

left

+ - (unary plus, minus)

right

* / %

left

+ -

left

BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS etc.

-

< > = ⇐ >= <> !=

left

IS NULL, IS FALSE, IS NOT TRUE etc.

-

NOT

right

AND

left

OR

left

13.8. Date and Time Functions

The below table provides a quick reference of Data and Time Functions:

Operator syntax Description

LOCALTIME

Returns the current date and time in the session time zone in a value of datatype TIME

LOCALTIME(precision)

Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision

LOCALTIMESTAMP

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP

LOCALTIMESTAMP(precision)

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with precision digits of precision

CURRENT_TIME

Returns the current time in the session time zone, in a value of datatype TIMESTAMP

CURRENT_DATE

Returns the current date in the session time zone, in a value of datatype DATE

CURRENT_TIMESTAMP

Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP

EXTRACT(timeUnit FROM datetime)

Extracts and returns the value of a specified datetime field from a datetime value expression

FLOOR(datetime TO timeUnit)

Rounds datetime down to timeUnit

CEIL(datetime TO timeUnit)

Rounds datetime up to timeUnit

YEAR(date)

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

QUARTER(date)

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

MONTH(date)

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

WEEK(date)

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

DAYOFYEAR(date)

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

DAYOFMONTH(date)

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

DAYOFWEEK(date)

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

HOUR(date)

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

MINUTE(date)

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

SECOND(date)

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

TIMESTAMPADD(timeUnit, integer, datetime)

Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to datetime + INTERVAL 'integer' timeUnit

TIMESTAMPDIFF(timeUnit, datetime, datetime2)

Returns the (signed) number of timeUnit intervals between datetime and datetime2.

LAST_DAY(date)

Returns the date of the last day of the month in a value of datatype DATE; For example, it returns DATE’2020-02-29’ for both DATE’2020-02-10’ and TIMESTAMP’2020-02-10 10:10:10’

CURRENT_MILLIS()

Returns the number of milliseconds since the epoch

CURRENT_SECONDS()

Returns the number of seconds since the epoch

TO_TIMESTAMP_MILLIS(numeric)

Returns the timestamp that is numeric number of milliseconds since the epoch

TO_TIMESTAMP_SECONDS(numeric)

Returns the timestamp that is numeric number of seconds since the epoch

TO_EPOCH_MILLIS(timestamp)

Returns the number of milliseconds from the epoch to timestamp

TO_EPOCH_SECONDS(timestamp)

Returns the number of seconds from the epoch to timestamp

Where:

timeUnit=
	MILLENNIUM | CENTURY | DECADE | YEAR
	| QUARTER | MONTH | WEEK | DOY | DOW | DAY
	| HOUR | MINUTE | SECOND | EPOCH

Note:

  • As stated above, by default, DATE, TIME and TIMESTAMP have no time zone. For those types, there is no implicit time zone, such as UTC (as in Java) or the local time zone. It is left to the user or application to supply a time zone. In turn, TIMESTAMP WITH LOCAL TIME ZONE does not store the time zone internally, times will be stored in UTC and it will rely on the supplied time zone to provide correct semantics.

  • Interval literals may only use time units YEAR, MONTH, DAY, HOUR, MINUTE and SECOND.

====Conditional Functions and Operators

You can find an overview of conditional functions and operators in the following table:

Operator syntax Description

CASE value
WHEN value1 [, value11 ]* THEN result1
[ WHEN valueN [, valueN1 ]* THEN resultN ]*
[ ELSE resultZ ]
END

Enumerated values

CASE
WHEN condition1 THEN result1
[ WHEN conditionN THEN resultN ]*
[ ELSE resultZ ]
END

Enumerated conditions

NULLIF(value, value)

Returns NULL if the values are the same.

For example, NULLIF(5, 5) returns NULL; NULLIF(5, 0) returns 5.

COALESCE(value, value [, value ]*)

returns the value of the first expression that is not NULL, or NULL otherwise

For example, COALESCE(NULL, 5) returns 5.

13.9. Period Functions

A period is defined by two points in time and can be done in the following ways:

period=
	(datetime, datetime)
	| (datetime, interval)
	| PERIOD (datetime, datetime)
	| PERIOD (datetime, interval)

There are a number of supported functions over periods:

Operator syntax Description

period1 CONTAINS datetime dt

Checks if the time period represented by period1 contains the specified datetime 'dt'.

period1 CONTAINS period2

Verifies if the time period represented by period1 contains the entire time period period2.

period1 OVERLAPS period2

Determines if there is any overlap between the time periods represented by period1 and period2. If there is any common time interval, the condition is true.

period1 EQUALS period2

Checks if the time period represented by period1 is equal to the time period period2 in its entirety.

period1 PRECEDES period2

Validates whether the time period represented by period1 comes before (precedes) the time period period2. It doesn’t necessarily mean immediate succession.

period1 IMMEDIATELY PRECEDES period2

Checks if the time period represented by period1 immediately precedes the time period period2 without any gap in between.

period1 SUCCEEDS period2

Verifies whether the time period represented by period1 comes after (succeeds) the time period period2. Similar to PRECEDES, it doesn’t imply immediate succession.

period1 IMMEDIATELY SUCCEEDS period2

Checks if the time period represented by period1 immediately succeeds the time period period2 without any gap.

Where period1 and period2 are period expressions.

13.10. Aggregate functions

Syntax:

aggregateCall=
	agg( [ ALL | DISTINCT ] value [, value ]*)
		[ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
		[ FILTER (WHERE condition) ]
	| agg(*) [ FILTER (WHERE condition) ]

where agg is one of the operators in the following table, or a user-defined aggregate function.

If FILTER is present, the aggregate function only considers rows for which condition evaluates to TRUE.

If DISTINCT is present, duplicate argument values are eliminated before being passed to the aggregate function.

If WITHIN GROUP is present, the aggregate function sorts the input rows according to the ORDER BY clause inside WITHIN GROUP before aggregating values. WITHIN GROUP is only allowed for hypothetical set functions (RANK, DENSE_RANK, PERCENT_RANK and CUME_DIST), inverse distribution functions (PERCENTILE_CONT and PERCENTILE_DISC) and collection functions (COLLECT and LISTAGG).

In orderItem, if expression is a positive integer n, it denotes the nth item in the SELECT clause. As an example:

SELECT F1, F2, F3, F4 FROM T ORDER BY 1,2

This will order by F1 and then F2 which are the first and second in the SELECT clause.

Operator syntax Description

COLLECT( [ ALL | DISTINCT ] value)

Returns a multiset of the values

LISTAGG( [ ALL | DISTINCT ] value [, separator])

Returns values concatenated into a string, delimited by separator (default ‘,’)

COUNT( [ ALL | DISTINCT ] value [, value ]*)

Returns the number of input rows for which value is not null (wholly not null if value is composite)

COUNT(*)

Returns the number of input rows

FUSION(multiset)

Returns the multiset union of multiset across all input values

APPROX_COUNT_DISTINCT(value [, value ]*)

Returns the approximate number of distinct values of value; the database is allowed to use an approximation but is not required to

AVG( [ ALL | DISTINCT ] numeric)

Returns the average (arithmetic mean) of numeric across all input values

SUM( [ ALL | DISTINCT ] numeric)

Returns the sum of numeric across all input values

MAX( [ ALL | DISTINCT ] value)

Returns the maximum value of value across all input values

MIN( [ ALL | DISTINCT ] value)

Returns the minimum value of value across all input values

ANY_VALUE( [ ALL | DISTINCT ] value)

Returns one of the values of value across all input values; this is NOT specified in the SQL standard

BIT_AND( [ ALL | DISTINCT ] value)

Returns the bitwise AND of all non-null input values, or null if none

BIT_OR( [ ALL | DISTINCT ] value)

Returns the bitwise OR of all non-null input values, or null if none

BIT_XOR( [ ALL | DISTINCT ] value)

Returns the bitwise XOR of all non-null input values, or null if none

STDDEV_POP( [ ALL | DISTINCT ] numeric)

Returns the population standard deviation of numeric across all input values

STDDEV_SAMP( [ ALL | DISTINCT ] numeric)

Returns the sample standard deviation of numeric across all input values

STDDEV( [ ALL | DISTINCT ] numeric)

Synonym for STDDEV_SAMP

VAR_POP( [ ALL | DISTINCT ] value)

Returns the population variance (square of the population standard deviation) of numeric across all input values

VAR_SAMP( [ ALL | DISTINCT ] numeric)

Returns the sample variance (square of the sample standard deviation) of numeric across all input values

COVAR_POP(numeric1, numeric2)

Returns the population covariance of the pair (numeric1, numeric2) across all input values

COVAR_SAMP(numeric1, numeric2)

Returns the sample covariance of the pair (numeric1, numeric2) across all input values

REGR_COUNT(numeric1, numeric2)

Returns the number of rows where both dependent and independent expressions are not null

REGR_SXX(numeric1, numeric2)

Returns the sum of squares of the dependent expression in a linear regression model

REGR_SYY(numeric1, numeric2)

Returns the sum of squares of the independent expression in a linear regression model

FIRST_IN_ORDER(column1, column2)

Returns the first value of column1 as ordered by column2 within the aggregate group

LAST_IN_ORDER(column1, column2)

Returns the last value of column1 as ordered by column2 within the aggregate group

13.11. Window Functions

Window functions are a category of SQL functions that operate on a specific "window" or subset of rows within a result set. These functions provide a way to perform calculations across a range of rows that are related to the current row. The syntax is as follows:

windowedAggregateCall=
	agg( [ ALL | DISTINCT ] value [, value ]*)
		[ RESPECT NULLS | IGNORE NULLS ]
		[ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
		[ FILTER (WHERE condition) ]
			OVER window
	| agg(*)
		[ FILTER (WHERE condition) ]
		OVER window

where agg is one of the operators in the following table, or a user-defined aggregate function.

DISTINCT, FILTER, and WITHIN GROUP behave as described for aggregate functions.

The supported operators are summarized in the following table:

Operator syntax Description

COUNT(value [, value ]*) OVER window

Returns the number of rows in window for which value is not null (wholly not null if value is composite)

COUNT(*) OVER window

Returns the number of rows in window

AVG(numeric) OVER window

Returns the average (arithmetic mean) of numeric across all values in window

SUM(numeric) OVER window

Returns the sum of numeric across all values in window

MAX(value) OVER window

Returns the maximum value of value across all values in window

MIN(value) OVER window

Returns the minimum value of value across all values in window

RANK() OVER window

Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer

DENSE_RANK() OVER window

Returns the rank of the current row without gaps; this function counts peer groups

ROW_NUMBER() OVER window

Returns the number of the current row within its partition, counting from 1

FIRST_VALUE(value) OVER window

Returns value evaluated at the row that is the first row of the window frame

LAST_VALUE(value) OVER window

Returns value evaluated at the row that is the last row of the window frame

LEAD(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

LAG(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

NTH_VALUE(value, nth) OVER window

Returns value evaluated at the row that is the nth row of the window frame

NTILE(value) OVER window

Returns an integer ranging from 1 to value, dividing the partition as equally as possible

Note:

  • You may specify null treatment (IGNORE NULLS, RESPECT NULLS) for FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD and LAG functions. The syntax handled by the parser, but only RESPECT NULLS is implemented at runtime.

13.12. Grouping Functions

LeanXcale supports the following grouping functions:

Operator syntax Description

GROUPING(expression [, expression ]*)

Returns a bit vector of the given grouping expressions

GROUP_ID()

Returns an integer that uniquely identifies the combination of grouping keys

GROUPING_ID(expression [, expression ]*)

Synonym for GROUPING

13.13. Grouped Window Functions

Grouped window functions occur in the GROUP BY clause and define a key value that represents a window containing several rows.

13.13.1. CTUMBLE

In streaming queries, TUMBLE assigns a window for each row of a relation based on a timestamp column. An assigned window is specified by its beginning and ending. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”. CTUMBLE function works in a similar way as TUMBLE does, but for regular tables.

Operator syntax Description

CTUMBLE(datetime COLUMN, interval [, time ])

Indicates a tumbling window of interval for datetime, optionally aligned at time

Here is an example:

SELECT CAST(CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000') AS TIMESTAMP) AS GTIMEST,
	keyid,
	SUM(metric1) AS SUMM1,
	SUM(metric2) AS SUMM2,
FROM TIMEMETRICS
WHERE keyid <= '1'
GROUP BY CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000'), keyid
ORDER BY 1,2

In the query above you will be getting the addition of the metrics for every key and for every hour (60 minute) interval. The aligned at value is causing the intervals to start at minute 01

The result obtained would look similar to:

GTIMEST KEYID SUMM1 SUMM2

2020-09-27 17:01:00

0

10

2.5

2020-09-27 17:01:00

1

10

2.5

2020-09-27 18:01:00

0

20

5.0

2020-09-27 18:01:00

1

20

5.0

2020-09-27 19:01:00

0

30

7.5

2020-09-27 19:01:00

1

30

7.5

13.13.2. Grouped Auxiliary Functions

Grouped auxiliary functions allow you to access properties of a window defined by a grouped window function:

Operator syntax Description

HOP_END(expression, slide, size [, time ])

Returns the value of expression at the end of the window defined by a HOP function call

HOP_START(expression, slide, size [, time ])

Returns the value of expression at the beginning of the window defined by a HOP function call

SESSION_END(expression, interval [, time])

Returns the value of expression at the end of the window defined by a SESSION function call

SESSION_START(expression, interval [, time])

Returns the value of expression at the beginning of the window defined by a SESSION function call

TUMBLE_END(expression, interval [, time ])

Returns the value of expression at the end of the window defined by a TUMBLE function call

TUMBLE_START(expression, interval [, time ])

Returns the value of expression at the beginning of the window defined by a TUMBLE function call

13.14. Time Series Functions: GAPFILL, INTERPOLATE, and LOCF

The GAPFILL operator can be used as a top-level function in a SELECT statement. It operates on timestamp expressions, aligned at a particular time unit, to fill missing values so that the result of the SELECT statement has no gaps with respect to the time unit. The values can then be filled with nulls, interpolated, or repeated. The input of the GAPFILL function is typically a time bucketing aggregate that aligns timestamps to a particular time unit, such as the functions FLOOR or CTUMBLE. Example:

create table abc(a integer primary key, b timestamp, c double);
insert into abc values(1, timestamp'2024-07-11 03:00:00', 10);
insert into abc values(2, timestamp'2024-07-11 03:00:03', 30);
select GAPFILL(ctumble(b, interval '1' second)), avg(c) as c
	from abc
		group by ctumble(b, interval '1' second);

Its signature is:

GAPFILL(ts timestamp [, start_ts timestamp, end_ts timestamp]) RETURNS timestamp

Note that the call to the GAPFILL function should be placed at the top level, i.e., in the SELECT list and not in the GROUP BY expression. This adds two values for the missing timestamps at 03:00:01 and 03:00:02, without filling the values at column c:

2024-07-11 03:00:00.0   10
2024-07-11 03:00:01.0
2024-07-11 03:00:02.0
2024-07-11 03:00:03.0   30

Alternatively, one can choose to fill the gaps with interpolated values or to repeat values by applying the INTERPOLATE or LOCF functions on the aggregated column:

select GAPFILL(ctumble(b, interval '1' second)),
	INTERPOLATE(avg(c))as c_interp, LOCF(avg(c)) as c_rep
from abc
group by ctumble(b, interval '1' second);

The output would be:

2024-07-11 03:00:00.0   10                             	10
2024-07-11 03:00:01.0   16.667                         	10
2024-07-11 03:00:02.0   23.333                         	10
2024-07-11 03:00:03.0   30                             	30

Their signatures are:

INTERPOLATE(a numeric) RETURNS numeric
LOCF(a numeric) RETURNS numeric

The CTUMBLE function can work with intervals that are arbitrary multiples of a millisecond time unit, i.e. milliseconds, seconds, minutes, hours, and days. The FLOOR function can also be used for time bucketing, only on single time units, but can be used with month and year time units. For example:

-- insert one more row for another month
insert into abc
	values(3, timestamp'2024-09-11 03:00:03', 30);

select GAPFILL(floor(b to month)), locf(avg(c))
	from abc
		group by floor(b to month);

Would yield as output:

  2024-07-01 00:00:00.0   20
  2024-08-01 00:00:00.0   20
  2024-09-01 00:00:00.0   30

And if the result should contain timestamps beyond the range defined by the underlying data, the range bounds can be specified as additional arguments to the gapfill function call:

select GAPFILL(floor(b to month), timestamp'2024-06-01 00:00:00',
timestamp'2024-10-01 00:00:00'), locf(avg(c))
	from abc
		group by floor(b to month);

Whose output is:

  2024-06-01 00:00:00.0
  2024-07-01 00:00:00.0   20
  2024-08-01 00:00:00.0   20
  2024-09-01 00:00:00.0   30
  2024-10-01 00:00:00.0   30

13.15. Approximate/Statistical Aggregate Functions

In this section we discuss a set of functions that are designed for estimating the cardinality of large datasets, frequency of distinct elements, percentiles, etc. with reduced memory usage and improved performance compared to exact methods.

13.15.1. HyperLogLog functions: HLL, DISTINCT_COUNT, and RELATIVE_ERROR

HyperLogLog (HLL) applies a probabilistic approach to approximate distinct values count using the HyperLogLog++ algorithm. For large datasets and datasets with high cardinality (many distinct values), this can be much more efficient in both CPU and memory than an exact count. HLL is an aggregate function, which returns an intermediate structure, that can be used to extract the distinct values count or be further involved in another HLL aggregation. The optional buckets parameter specifies the number of buckets for the HyperLogLog algorithm; it must be a value between 16 and 262144; if omitted, a default value of 4096 is applied.

Example:

-- approximates select count(distinct b) from abc; with error err
-- data contains 150000 distinct values; the result is approximately 150000
select distinct_count(hll(b)) as cnt,
	relative_error(hll(b)) as err
from abc;

-- aggregates on intermediate HLL structures
-- performs two-level aggregation on overlapping partitions of the data, each of 100000 distinct values
-- the result is still approximately 150000
select distinct_count(hll(h)) as cnt,
	relative_error(hll(h)) as err
from (
	select hll(b) as h from abc where a <= 100000
	union all
	select hll(b) as h from abc where a > 50000
);

The signatures of the functions are:

HLL(a any [, buckets integer]) RETURNS hll
DISTINCT_COUNT(h hll) RETURNS bigint
RELATIVE_ERROR(h hll) RETURNS double
HLL(h hll) RETURNS hll

13.15.2. Approximate Percentiles with UDDSketch: UDDS, APPROX_PERCENTILE, APPROX_PERCENTILE_RANK, etc.

UDDSketch estimates the value at a given percentile, or the percentile rank of a given value, using the UddSketch algorithm. This estimation is more memory- and CPU-efficient than an exact calculation. UDDS is an aggregate function, which returns an intermediate structure, that can be used to extract percentiles or ranks or be further involved in another UDDS aggregation. If optional parameters buckets and maxerr are omitted, default values of 200 buckets and 0.001 for the maximal error are applied.

Example:

-- approximates the value at the 75-th percentile and the rank (between 0 and 1) of value 13
select approx_percentile(udds(b), 0.75),
	approx_percentile_rank(udds(b), 13)
from abc;

-- aggregates on intermediate UDDS structures; returns the same as the above query
select approx_percentile(udds(u), 0.75),
	approx_percentile_rank(udds(u), 13)
from (
	select udds(b) as u from abc where a < 10
	union all
	select udds(b) as u from abc where a >= 10
);

The signatures are:

UDDS(a numeric [, buckets integer, maxerr double]) RETURNS udds
APPROX_PERCENTILE(u udds, p double) RETURNS double
APPROX_PERCENTILE_ARRAY(u udds, p double[]) RETURNS double[]
APPROX_PERCENTILE_RANK(u udds, v double) RETURNS double
UDDS_ERROR(u udds) RETURNS double
UDDS_MEAN(u udds) RETURNS double
UDDS_NVALS(u udds) RETURNS integer
UDDS(u udds) RETURNS udds

13.15.3. Statistical Aggregate Functions: STATS, STATS_MEAN, STATS_STDDEV, STATS_SKEWNESS, STATS_KURTOSIS, STATS_SUM, STATS_COUNT

STATS is an aggregate function, which performs common statistical analysis into an intermediate structure, that can be used to extract statistical metrics or be further involved in another STATS aggregation.

Example:

-- returns the stddev, kurtosis, and skewness of values in column b
select stats_stddev(stats(b)),
	stats_kurtosis(stats(b)),
	stats_skewness(stats(b))
from abc;

-- aggregates on intermediate STATS structures; returns the same as the above query
select stats_stddev(stats(s)),
	stats_kurtosis(stats(s)),
	stats_skewness(stats(s))
from (
	select stats(b) as s from abc where a < 10
	union all
	select stats(b) as s from abc where a >= 10
);

The signature of the functions are:

STATS(a numeric) RETURNS stats
STATS_MEAN(s stats) RETURNS double
STATS_STDDEV(s stats) RETURNS double
STATS_SKEWNESS(s stats) RETURNS double
STATS_KURTOSIS(s stats) RETURNS double
STATS_SUM(s stats) RETURNS double
STATS_COUNT(s stats) RETURNS bigint
STATS(s stats) RETURNS stats

13.16. JSON Functions

LeanXcale also supports a number of functions to operate over strings containing JSON values. We use the following convention in the following subsections:

  • jsonValue is a character string containing a JSON value;

  • path is a character string containing a JSON path expression; mode flag strict or lax should be specified in the beginning of path.

13.16.1. JSON Query Functions

These are the query function predicates over JSON values:

Operator syntax Description

JSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ] )

Whether a jsonValue satisfies a search criterion described using JSON path expression path

JSON_VALUE(jsonValue, path [ RETURNING type ] [ { ERROR | NULL | DEFAULT expr } ON EMPTY ] [ { ERROR | NULL | DEFAULT expr } ON ERROR ] )

Extract an SQL scalar from a jsonValue using JSON path expression path

JSON_QUERY(jsonValue, path [ { WITHOUT [ ARRAY ] | WITH [ CONDITIONAL | UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON ERROR ] )

Extract a JSON object or JSON array from jsonValue using the path JSON path expression

JSON_HAS_VALUE(jsonValue1, jsonValue2)

Whether jsonValue1 contains jsonValue2

Note:

  • The ON ERROR and ON EMPTY clauses define the fallback behavior of the function when an error is thrown or a null value is about to be returned.

  • The ARRAY WRAPPER clause defines how to represent a JSON array result in JSON_QUERY function. The following examples compare the wrapper behaviors.

Sample Data:

{"a": "[1,2]", "b": [1,2], "c": "hi"}

These are the JSON query functions: Comparison:

Operator $.a $.b $.c

JSON_VALUE

[1, 2]

error

hi

JSON QUERY WITHOUT ARRAY WRAPPER

error

[1, 2]

error

JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[ [1,2] ]

[ “hi” ]

JSON QUERY WITH CONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[1,2]

[ “hi” ]

13.16.2. JSON Constructor Functions

These are the JSON value constructor functions:

Operator syntax Description

JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct JSON object using a series of key (name) value (value) pairs

JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON object using a key (name) value (value) pair

JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct a JSON array using a series of values (value)

JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON array using a value (value)

Note:

  • The flag FORMAT JSON indicates the value is formatted as JSON character string. When FORMAT JSON is used, the value should be de-parse from JSON character string to a SQL structured value.

  • ON NULL clause defines how the JSON output represents null values. The default null behavior of JSON_OBJECT and JSON_OBJECTAGG is NULL ON NULL, and for JSON_ARRAY and JSON_ARRAYAGG it is ABSENT ON NULL.

  • If ORDER BY clause is provided, JSON_ARRAYAGG sorts the input rows into the specified order before performing aggregation.

13.16.3. JSON Validity Operators

The following functions enable to check the validity of a JSON value:

Operator syntax Description

jsonValue IS JSON [ VALUE ]

Whether jsonValue is a JSON value

jsonValue IS NOT JSON [ VALUE ]

Whether jsonValue is not a JSON value

jsonValue IS JSON SCALAR

Whether jsonValue is a JSON scalar value

jsonValue IS NOT JSON SCALAR

Whether jsonValue is not a JSON scalar value

jsonValue IS JSON OBJECT

Whether jsonValue is a JSON object

jsonValue IS NOT JSON OBJECT

Whether jsonValue is not a JSON object

jsonValue IS JSON ARRAY

Whether jsonValue is a JSON array

jsonValue IS NOT JSON ARRAY

Whether jsonValue is not a JSON array

13.16.4. JSON Auxiliary Functions

These are some JSON auxiliary functions:

Operator syntax Description

JSON_TYPE(jsonValue)

Returns a string value indicating the type of a jsonValue

JSON_DEPTH(jsonValue)

Returns an integer value indicating the depth of a jsonValue

JSON_PRETTY(jsonValue)

Returns a pretty-printing of jsonValue

JSON_LENGTH(jsonValue [, path ])

Returns a integer indicating the length of jsonValue

JSON_KEYS(jsonValue [, path ])

Returns a string indicating the keys of a JSON jsonValue

JSON_REMOVE(jsonValue, path[, path])

Removes data from jsonValue using a series of path expressions and returns the result

Note:

  • JSON_TYPE / JSON_DEPTH / JSON_PRETTY return null if the argument is null

  • JSON_LENGTH / JSON_KEYS / JSON_REMOVE return null if the first argument is null

  • JSON_TYPE generally returns an upper-case string flag indicating the type of the JSON input. Currently supported supported type flags are:

    • INTEGER

    • STRING

    • FLOAT

    • DOUBLE

    • LONG

    • BOOLEAN

    • DATE

    • OBJECT

    • ARRAY

    • NULL

  • JSON_DEPTH defines a JSON value’s depth as follows:

    • An empty array, empty object, or scalar value has depth 1;

    • A non-empty array containing only elements of depth 1 or non-empty object containing only member values of depth 1 has depth 2;

    • Otherwise, a JSON document has depth greater than 2.

  • JSON_LENGTH defines a JSON value’s length as follows:

    • A scalar value has length 1;

    • The length of array or object is the number of elements is contains.

Usage Examples:

13.16.5. JSON_TYPE Example

This statement:

SELECT JSON_TYPE(v) AS c1,
	JSON_TYPE(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
	JSON_TYPE(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
	JSON_TYPE(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4

OBJECT

ARRAY

INTEGER

BOOLEAN

13.16.6. JSON_DEPTH Example

This statement returns:

SELECT JSON_DEPTH(v) AS c1,
	JSON_DEPTH(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
	JSON_DEPTH(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
	JSON_DEPTH(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4

3

2

1

1

13.16.7. JSON_LENGTH Example

This statement:

SELECT JSON_LENGTH(v) AS c1,
	JSON_LENGTH(v, 'lax $.a') AS c2,
	JSON_LENGTH(v, 'strict $.a[0]') AS c3,
	JSON_LENGTH(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true]}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4

1

2

1

1

13.16.8. JSON_KEYS Example

This statement:

SELECT JSON_KEYS(v) AS c1,
	JSON_KEYS(v, 'lax $.a') AS c2,
	JSON_KEYS(v, 'lax $.b') AS c2,
	JSON_KEYS(v, 'strict $.a[0]') AS c3,
	JSON_KEYS(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true],"b": {"c": 30}}')) AS t(v)
LIMIT 10;

returns:

c1 c2 c3 c4 c5

[“a”, “b”]

NULL

[“c”]

NULL

NULL

13.16.9. JSON_REMOVE Example

This statement:

SELECT JSON_REMOVE(v, '$[1]') AS c1
FROM (VALUES ('["a", ["b", "c"], "d"]')) AS t(v)
LIMIT 10;

returns:

c1

[“a”, “d”]

13.16.10. JSON_STORAGE_SIZE Example

This statement:

SELECT
	JSON_STORAGE_SIZE('[100, \"sakila\", [1, 3, 5], 425.05]') AS c1,
	JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"a\", \"c\": \"[1, 3, 5, 7]\"}') AS c2,
	JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"xyz\", \"c\": \"[1, 3, 5, 7]\"}') AS c3,
	JSON_STORAGE_SIZE('[100, \"json\", [[10, 20, 30], 3, 5], 425.05]') AS c4
LIMIT 10;

returns:

c1 c2 c3 c4

29

35

37

36

13.16.11. DECODE Example

This statement:

SELECT DECODE(f1, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c1,
	DECODE(f2, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c2,
	DECODE(f3, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c3,
	DECODE(f4, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c4,
	DECODE(f5, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c5
FROM (VALUES (1, 2, 3, 4, 5)) AS t(f1, f2, f3, f4, f5);

returns:

c1 c2 c3 c4 c5

aa

bb

cc

dd

ee

13.16.12. TRANSLATE example

This statement:

SELECT TRANSLATE('Aa*Bb*Cc''D*d', ' */''%', '_') as c1,
	TRANSLATE('Aa/Bb/Cc''D/d', ' */''%', '_') as c2,
	TRANSLATE('Aa Bb Cc''D d', ' */''%', '_') as c3,
	TRANSLATE('Aa%Bb%Cc''D%d', ' */''%', '_') as c4
FROM (VALUES (true)) AS t(f0);

returns:

c1 c2 c3 c4

Aa_Bb_CcD_d

Aa_Bb_CcD_d

Aa_Bb_CcD_d

Aa_Bb_CcD_d

14. User-Defined Functions

LeanXcale is extensible. One can define new functions using user code. There are several ways to define a function.

To implement a scalar function, there are 3 options:

  • Create a class with a public static eval method, and register the class;

  • Create a class with a public non-static eval method, and a public constructor with no arguments, and register the class;

  • Create a class with one or more public static methods, and register each class/method combination.

To implement an aggregate function, there are 2 options:

  • Create a class with public static init, add and result methods, and register the class;

  • Create a class with public non-static init, add and result methods, and a public constructor with no arguments, and register the class.

Optionally, add a public merge method to the class; this allows Calcite to generate code that merges sub-totals.

Optionally, make your class implement the SqlSplittableAggFunction interface; this allows Calcite to decompose the function across several stages of aggregation, roll up from summary tables, and push it through joins.

To implement a table function, there are 3 options:

To implement a table macro, there are 3 options:

  • Create a class with a static eval method that returns TranslatableTable, and register the class;

  • Create a class with a non-static eval method that returns TranslatableTable, and register the class;

  • Create a class with one or more public static methods that return TranslatableTable, and register each class/method combination.

LeanXcale infers the parameter types and result type of a function from the parameter and return types of the Java method that implements it. Further, you can specify the name and optionality of each parameter using the Parameter annotation.

14.1. Calling Functions with Named and Optional Parameters

Usually when you call a function, you need to specify all of its parameters, in order. But that can be a problem if a function has a lot of parameters, and especially if you want to add more parameters over time.

To solve this problem, the SQL standard allows you to pass parameters by name, and to define parameters which are optional (that is, have a default value that is used if they are not specified).

Suppose you have a function f, declared as in the following pseudo syntax:

FUNCTION f(
	INTEGER a,
	INTEGER b DEFAULT NULL,
	INTEGER c,
	INTEGER d DEFAULT NULL,
	INTEGER e DEFAULT NULL) RETURNS INTEGER

All of the function’s parameters have names, and parameters b, d and e have a default value of NULL and are therefore optional. In LeanXcale NULL is the only allowable default value for optional parameters.

When calling a function with optional parameters, you can omit optional arguments at the end of the list, or use the DEFAULT keyword for any optional arguments. Here are some examples:

  • f(1, 2, 3, 4, 5) provides a value to each parameter, in order;

  • f(1, 2, 3, 4) omits e, which gets its default value, NULL;

  • f(1, DEFAULT, 3) omits d and e, and specifies to use the default value of b;

  • f(1, DEFAULT, 3, DEFAULT, DEFAULT) has the same effect as the previous example;

  • f(1, 2) is not legal, because c is not optional;

  • f(1, 2, DEFAULT, 4) is not legal, because c is not optional.

You can specify arguments by name using the => syntax. If one argument is named, they all must be. Arguments may be in any other, but must not specify any argument more than once, and you need to provide a value for every parameter which is not optional. Here are some examples:

  • f(c => 3, d => 1, a => 0) is equivalent to f(0, NULL, 3, 1, NULL);

  • f(c => 3, d => 1) is not legal, because you have not specified a value for a and a is not optional.

15. Pattern Matching of CEP Events with MATCH_RECOGNIZE

MATCH_RECOGNIZE is a SQL extension for recognizing sequences of events in complex event processing (CEP).

The syntax is as follows:

matchRecognize=
	MATCH_RECOGNIZE '('
		[ PARTITION BY expression [, expression ]* ]
		[ ORDER BY orderItem [, orderItem ]* ]
		[ MEASURES measureColumn [, measureColumn ]* ]
		[ ONE ROW PER MATCH | ALL ROWS PER MATCH ]
		[ AFTER MATCH
			( SKIP TO NEXT ROW
			| SKIP PAST LAST ROW
			| SKIP TO FIRST variable
			| SKIP TO LAST variable
			| SKIP TO variable )
		]
		PATTERN '(' pattern ')'
			[ WITHIN intervalLiteral ]
			[ SUBSET subsetItem [, subsetItem ]* ]
		DEFINE variable AS condition [, variable AS condition ]*
	')'

subsetItem=
	variable = '(' variable [, variable ]* ')'

measureColumn=
	expression AS alias

pattern=
	patternTerm [ '|' patternTerm ]*

patternTerm=
	patternFactor [ patternFactor ]*

patternFactor=
	patternPrimary [ patternQuantifier ]

patternPrimary=
	variable
	|   '$'
	|   '^'
	|   '(' [ pattern ] ')'
	|   '{-' pattern '-}'
	|   PERMUTE '(' pattern [, pattern ]* ')'

patternQuantifier=
	'*'
	|   '*?'
	|   '+'
	|   '+?'
	|   '?'
	|   '??'
	|   '{' { [ minRepeat ], [ maxRepeat ] } '}' ['?']
	|   '{' repeat '}'

intervalLiteral=
	INTERVAL 'string' timeUnit [ TO timeUnit ]

In patternQuantifier, repeat is a positive integer, and minRepeat and maxRepeat are non-negative integers.

16. Copying Data Between Two Tables

Currently, the syntax CREATE TABLE …​ AS SELECT …​ is not supported. In order to copy data from one table to another, one first need to create the new table and then do INSERT/UPSERT …​ SELECT. This is an example:

CREATE TABLE TDEST (f1_key BIGINT, f2_value VARBINARY, PRIMARY KEY(f1_key));

INSERT INTO TDEST SELECT * FROM TORIG;

Additionally, the WITH TEMPORAL AS clause for using a temporal table can be used as in the following example:

CREATE TABLE Persons (
	ID int NOT NULL,
	NAME VARCHAR(20) NOT NULL,
	AGE int NOT NULL,
	SALARY double NOT NULL,
	PRIMARY KEY (ID)
);

INSERT INTO Persons (WITH TEMPORAL AS (SELECT 1, 'Juan', 10, cast(1550.1 as double)) SELECT * FROM temporal);

17. JDBC Function Escapes

JDBC function escapes are a mechanism provided by the JDBC API to invoke database-specific functions and procedures using standard SQL syntax within a Java application. JDBC function escapes are useful to use functions in Java code.

The syntax for JDBC function escapes involves using curly braces {} to enclose the call to the database-specific functions. The format typically looks like {fn functionName(argument1, argument2, …​)}. The following sections provide the different functions for different data types.

17.1. Numeric

These are the Numeric JDBC function escapes:

Operator syntax Description

{fn ABS(numeric)}

Returns the absolute value of numeric

{fn ACOS(numeric)}

Returns the arc cosine of numeric

{fn ASIN(numeric)}

Returns the arc sine of numeric

{fn ATAN(numeric)}

Returns the arc tangent of numeric

{fn ATAN2(numeric, numeric)}

Returns the arc tangent of the numeric coordinates

{fn CBRT(numeric)}

Returns the cube root of numeric

{fn CEILING(numeric)}

Rounds numeric up, and returns the smallest number that is greater than or equal to numeric

{fn COS(numeric)}

Returns the cosine of numeric

{fn COT(numeric)}

Returns the cotangent of numeric

{fn DEGREES(numeric)}

Converts numeric from radians to degrees

{fn EXP(numeric)}

Returns e raised to the power of numeric

{fn FLOOR(numeric)}

Rounds numeric down, and returns the largest number that is less than or equal to numeric

{fn LOG(numeric)}

Returns the natural logarithm (base e) of numeric

{fn LOG10(numeric)}

Returns the base-10 logarithm of numeric

{fn MOD(numeric1, numeric2)}

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

{fn PI()}

Returns a value that is closer than any other value to pi

{fn POWER(numeric1, numeric2)}

Returns numeric1 raised to the power of numeric2

{fn RADIANS(numeric)}

Converts numeric from degrees to radians

{fn RAND(numeric)}

Returns a random double using numeric as the seed value

{fn ROUND(numeric1, numeric2)}

Rounds numeric1 to numeric2 places right to the decimal point

{fn SIGN(numeric)}

Returns the signum of numeric

{fn SIN(numeric)}

Returns the sine of numeric

{fn SQRT(numeric)}

Returns the square root of numeric

{fn TAN(numeric)}

Returns the tangent of numeric

{fn TRUNCATE(numeric1, numeric2)}

Truncates numeric1 to numeric2 places right to the decimal point

17.2. String

These are the String JDBC function escapes:

Operator syntax Description

{fn ASCII(string)}

Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty

{fn CONCAT(character, character)}

Returns the concatenation of character strings

{fn INSERT(string1, start, length, string2)}

Inserts string2 into a slot in string1

{fn LCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to lower case

{fn LENGTH(string)}

Returns the number of characters in a string

{fn LOCATE(string1, string2 [, integer])}

Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified.

{fn LEFT(string, length)}

Returns the leftmost length characters from string

{fn LTRIM(string)}

Returns string with leading space characters removed

{fn REPLACE(string, search, replacement)}

Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed

{fn REVERSE(string)}

Returns string with the order of the characters reversed

{fn RIGHT(string, integer)}

Returns the rightmost length characters from string

{fn RTRIM(string)}

Returns string with trailing space characters removed

{fn SUBSTRING(string, offset, length)}

Returns a character string that consists of length characters from string starting at the offset position

{fn UCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to upper case

17.3. Date/time

These are the DATE and TIME JDBC function escapes:

Operator syntax Description

{fn CURDATE()}

Equivalent to CURRENT_DATE

{fn CURTIME()}

Equivalent to LOCALTIME

{fn NOW()}

Equivalent to LOCALTIMESTAMP

{fn YEAR(date)}

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

{fn QUARTER(date)}

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

{fn MONTH(date)}

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

{fn WEEK(date)}

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

{fn DAYOFYEAR(date)}

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

{fn DAYOFMONTH(date)}

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

{fn DAYOFWEEK(date)}

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

{fn HOUR(date)}

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

{fn MINUTE(date)}

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

{fn SECOND(date)}

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

{fn TIMESTAMPADD(timeUnit, count, datetime)}

Adds an interval of count timeUnits to a datetime

{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)}

Subtracts timestamp1 from timestamp2 and returns the result in timeUnits

17.4. System

These are the system JDBC function escapes:

Operator syntax Description

{fn DATABASE()}

Equivalent to CURRENT_CATALOG

{fn IFNULL(value1, value2)}

Returns value2 if value1 is null

{fn USER()}

Equivalent to CURRENT_USER

17.4.1. Conversion

These are the conversion JDBC function escapes:

Operator syntax Description

{fn CONVERT(value, type)}

Cast value into type