SQL Data Manipulation Language (DML) Reference
This page describes LeanXcale’s SQL DML.
The SQL syntax is described using an EBNF-like form.
1. Isolation Levels
Database transactions provide the so-called ACID properties. The I stands for isolation and it takes care of the consistency of the data in the advent of concurrent transactions updating the same row(s). The higher the consistency level the most consistency is attained, but it is more costly to execute transactions. The levels from less to more consistency are (short names between parenthesis):
-
Loader: Mode to load data without any other parallel activity. It deactivates conflict checking and logging, a transaction gets as start timestamp the current snapshot. It is highly efficient for loading.
-
Raw_read_committed: (rawrc) This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. When the transaction commits its updates become visible, its updates become visible as they reach the involved storage servers, so it would be possible for another transaction to see only part of the updates made by a transaction. No conflicts are checked (writes over the same row by parallel transactions, transactions overlapping in time are allowed).
-
Read_committed (rc): This isolation level ensures that a transaction can only see committed changes made by other transactions. It also eliminates dirty reads. However, it allows non-repeatable reads, as a transaction may see different values for the same record within the same transaction if other transactions commit changes. When the transaction starts it waits for the snapshot to reach the commit timestamp of the last transaction committed in the session, and then it gets the snapshot as start timestamp.
-
Snapshot_isolation. It guarantees repeatable reads and no phantom reads. This isolation levels is called serializability in other database systems (incorrectly). Only write skew anomalies might happen. In case write skew might happen, it can be avoided by using select for update. The trasaction when started waits for the snapshot to be equal or higher than the commit timestamp of the last committed transaction in the session. Then it gets the current snapshot as start timestamp.
The default isolation mode is read_committed. The isolation mode is specified in the connection string in the driver with the property mode using either the long or short name of the isolation mode. The isolation mode can be changed via the SQL statement SET ISOLATION LEVEL described in Section 5
It should be noted that leanxcale provides fully optimistic concurrency control based on MVCC, this means, that in the advent of concurrent transactions (such that the start of one is between the start and commit of another) updating the same row results in aborting the second transaction performing the update. Other database systems in addition to MVCC has also locking. In those databases with snapshot isolation and serializability concurrent transactions modying the same row will also result in abortin the second transaction performing the update. However, in read committed mode, they don’t abort any of the transactions, just delay the second one till the first one is committed or rolled back. When a transaction is aborted due to a conflict the following exception is raised: … ConflictManagerException …
In case of rollback due to a conflict, the application should handle the exception and retry the transaction.
2. DML SQL Statements
The DML SQL statements supported are:
statement= queryStatement | upsertStatement | insertStatement | updateStatement | deleteStatement | mergeStatement | setIsolationLevelStatement | explainStatement | execStatement | batchStatement
Multiple SQL statements can be provided by separating them with semicolons:
statementList= statement; [statement';']*
3. Query Statement
A query statement can have the following forms:
queryStatement= ( selectStatement | selectForUpdateStatement | unionStatement | intersectStatement | exceptStatement | intersectStatement | withStatement | valuesStatement ) [ limitClause ] [ offsetClause ] [ fetchClause ]
3.1. SELECT Statement
The most common statement is the SELECT statement. The SELECT is accepted with FROM (the standard) and also without FROM to have compatibility with those SQL dialects that accept the SELECT without FROM:
selectStatement= ( SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } [ FROM tableExpression ] [ WHERE booleanExpression ] [ GROUP BY { groupItem [, groupItem ]* } ] [ HAVING booleanExpression ] [ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ] [ FOR UPDATE] [ ORDER BY item(, item)*] [ limitClause | OffsetClause [limitClause | FetchClause] | FetchClause ]
3.1.1. ALL/DISTINCT clause
Specifying the DISTINCT clause, filters out the same rows and returns only different (i.e., distinct) values in the resultset. The ALL clause returns all values, i.e., including repeated rows, it is the default behavior.
After the optional ALL/DISTINCT clause the list of columns to be projected is provided:
projectItem= expression [ [ AS ] columnAlias ] | tableAlias . *
3.1.2. FROM Clause
In the optional FROM clause a tableExpression is provided that is basically the list of tables used in the select and how they should be joined (kind of join, join condition):
tableExpression= tableReference [, tableReference ]* | tableExpression [ NATURAL ] [ ( LEFT | RIGHT | FULL ) [ OUTER ] ] JOIN tableExpression [ joinCondition ] | tableExpression CROSS JOIN tableExpression tableReference= tablePrimary [ matchRecognize ] [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ] tablePrimary= [ [ catalogName . ] schemaName . ] tableName '(' TABLE [ [ catalogName . ] schemaName . ] tableName ')' | [ LATERAL ] '(' queryStatement ')' | UNNEST '(' expression ')' [ WITH ORDINALITY ] | [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')' joinCondition= ON booleanExpression | USING '(' column [, column ]* ')'
A SELECT without FROM is equivalent to VALUES (included for compatibility with PosgreSQL), as for example:
SELECT 1, CURRENT_TIMESTAMP;
This just yields the value 1 and the current timestamp in the database server.
3.1.3. WHERE Clause
The optional WHERE clause enables to provide a filter condition to select only the rows that fulfill the condition.
[ WHERE booleanExpression ]
3.1.4. GROUP BY and HAVING Clauses
The optional GROUP BY clause:
[ GROUP BY { groupItem [, groupItem ]* } ] [ HAVING booleanExpression ]
enables to aggregate rows with the same values in a set of columns into summary rows.
groupItem= expression | '(' ')' | '(' expression [, expression ]* ')' | CUBE '(' expression [, expression ]* ')' | ROLLUP '(' expression [, expression ]* ')' | GROUPING SETS '(' groupItem [, groupItem ]* ')'
The optional HAVING clause the HAVING clause is used in conjunction with the GROUP BY clause to filter the results of a query based on aggregate functions applied to grouped data. The reason is because the WHERE keyword cannot be used with aggregate functions.
In the SELECT, HAVING and ORDER BY clauses of an aggregate query, all expressions must be constant within the current group (that is, grouping constants as defined by the GROUP BY clause, or constants), or aggregate functions, or a combination of constants and aggregate functions. Aggregate and grouping functions may only appear in an aggregate query, and only in a SELECT, HAVING or ORDER BY clause.
3.1.5. WINDOW Clause
The optional WINDOW clause:
[ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]
is used in conjunction with window functions to define a window of rows over which the window function will operate. Window functions perform a calculation across a set of table rows related to the current row. The WINDOW clause allows you to specify the window frame or partition over which the window function operates:
window= windowName | windowSpec windowSpec= '(' [ windowName ] [ ORDER BY orderItem [, orderItem ]* ] [ PARTITION BY expression [, expression ]* ] [ RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING } | ROWS numericExpression { PRECEDING | FOLLOWING } ] ')'
3.1.6. ORDER BY Clause
The ORDER BY clause allows to sort the resultset by a set of columns and/or expressions:
orderByClause= ORDER BY expression [DESC | ASC] [NULLS (FIRST | LAST)] [, expression [DESC | ASC]]*
For each expression (it can be just a column name) one can optionally specify if the order is ascending (ASC) or descending (DESC). One can also optionally specify whether NULLS should appear before other values (NULLS FIRST) or after other values (NULLS LAST). By default, ordering is ascending and nulls appears at the end (i.e., ASC NULLS LAST for all expressions).
3.1.7. LIMIT Clause
The optional LIMIT clause is used to restrict the number of rows returned by a query. Additionally, an additional OFFSET clause can be used which allows to skip a certain number of rows before starting to return rows. The syntax is:
limitClause= [ LIMIT (count | ALL) ]
3.1.8. OFFSET Clause
The optional OFFSET clause:
offsetClause = [ OFFSET start ( ROW | ROWS ) ]
is used in conjunction with the LIMIT clause to specify the starting point within the result set for fetching rows. It allows you to skip a certain number of rows before beginning to return rows.
3.1.9. FETCH Clause
The optional FETCH clause:
fetchClause = [ FETCH ( FIRST | NEXT ) [ count ] ( ROW | ROWS ) ONLY ]
is used to retrieve a specified number of rows from the result set produced by a query. It is often used in conjunction with the OFFSET clause for pagination purposes, where you want to retrieve a specific subset of rows from a larger result set. The FIRST keyword indicates that you want to retrieve the first set of rows from the result set. The NEXT keyword indicates that you want to retrieve the next set of rows from the result set. The ONLY keyword is used in conjunction with the FETCH clause to specify that only the specified number of rows should be returned, and no additional rows should be fetched beyond that limit.
3.1.10. FOR UPDATE Clause
Snapshot isolation isolation level can suffer from write-skew anomaly in some scenarios, when an integrity constraint should be preserved among 2 or more rows, updating only a subset of them. An example of write skew is an operation that can withdraw funds from any two accounts for the same customer as far as the joint balance is positive (i.e. the sum bot both account balances is greater or equal than zero).
Assume that initially two accounts from the same customer with customerid 27 has a joint balance of 10:
+------------+---------+------------+ | ACCOUNTID | BALANCE | CUSTOMERID | +------------+---------+------------+ | 1 | 3 | 27 | | 2 | 7 | 27 | +------------+---------+------------+ SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1; +--------+ | EXPR$0 | +--------+ | 10 | +--------+
We have 2 concurrent transactions with isolation level set to snapsthot isolation and with AUTOCOMMIT set to OFF. The first transaction wants to withdraw from accountid 1 an amount of 5. It checks the current joint balance:
SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1; +--------+ | EXPR$0 | +--------+ | 10 | +--------+
Since 5 ⇐ 10 it will proceed to subtract 5 from accountid 1. However, before the transaction commits, a second transaction wants to withdraw from accountid 2 an amount of 6. So it checks the current joint balance:
SELECT SUM(balance) FROM ACCOUNTS WHERE customerid=1; +--------+ | EXPR$0 | +--------+ | 10 | +--------+
Since 6 ⇐ 10 it will proceed to subtract 6 from accountid 2.
Now the first transaction updates accountid 1 with an amount 3-5=-2 and commits:
UPDATE your_table_name SET BALANCE = -2 WHERE ACCOUNTID = 1; COMMIT;
Now the second transaction updates accountid 2 with an amount 5-6=-1 and commits:
UPDATE your_table_name SET BALANCE = -1 WHERE ACCOUNTID = 2; COMMIT;
The final result after committing both transactions is:
+------------+---------+------------+ | ACCOUNTID | BALANCE | CUSTOMERID | +------------+---------+------------+ | 1 | -1 | 27 | | 2 | -2 | 27 | +------------+---------+------------+
That violates the integrity constraint of keeping a non negative joint balance. The reason is that both transactions observe the same snapshot.
By using SELECT FOR UPDATE in both transactions, the first transaction doing the update over the pair of involved rows will succeed, while the second one will be aborted because the UPDATE of the first transaction over accountid 1 will conflict with the SELECT FOR UPDATE performed by the second transaction over accountid 1, and thus, resulting in its abortion.
SELECT FOR UPDATE has the same syntax as SELECT except using the clause FOR UPDATE at the end of the SELECT.
3.2. UNION, INTERSECT, and EXCEPT Statements
The union, intersect, and except statements can combine resultsets in different ways performing the set operations union, intersect and difference, respectively, and have the following syntax:
unionStatement= queryStatement UNION [ ALL | DISTINCT ] queryStatement intersectStatement= queryStatement INTERSECT [ ALL | DISTINCT ] queryStatement exceptStatement= queryStatement EXCEPT [ ALL | DISTINCT ] queryStatement
MINUS is not supported but EXCEPT is equivalent to MINUS.
3.3. WITH Statement
The second form of a query has a with clause that has the following syntax:
withStatement= WITH withItem [, withItem]* queryStatement withItem= name [ '(' column [, column ]* ')' ] AS '(' queryStatement ')'....
3.4. VALUES Statement
The VALUES statement basically defines a table literal with one or more rows, that is, it is a table value constructor. Its syntax is:
values= VALUES rowExpression [, rowExpression ]*
Where a rowExpression is specified as follows:
rowExpression= expression | [ROW] '(' expression [, expression]* ')'
3.5. Subqueries
A scalar sub-query is a sub-query used as an expression. If the sub-query returns no rows, the value is NULL; if it returns more than one row, it is an error. IN, EXISTS and scalar sub-queries can occur in any place where an expression can occur (such as the SELECT clause, WHERE clause, ON clause of a JOIN, or as an argument to an aggregate function).
An IN, EXISTS, or scalar sub-query may be correlated, that is, it may refer to tables in the FROM clause of an enclosing query.
4. UPSERT, INSERT, and UPDATE Statements
For modifying the database there are four kinds of statements allowing to upsert, insert, update, and delete rows from the database. Upsert updates or insert depending whether a row exists with the provided primary key or not. Insert adds a new row with that primary key, if there is no row with that primary key already in the table (otherwise it returns an error). Update modifies a row with a particular primary key, if there is a row in the table with that primary (otherwise it returns an error). Upsert is preferred over insert and update since it is the most efficient. Insert and update both require to perform an additional read to check whether the row exists or not, basically duplicating the cost of the statement. Their syntax is as follows:
upsertStatement= UPSERT INTO tablePrimary [ '(' column [, column ]* ')' ] queryStatement insertStatement= INSERT INTO tablePrimary [ '(' column [, column ]* ')' ] queryStatement updateStatement= UPDATE tablePrimary SET assign [, assign ]* [ WHERE booleanExpression ] deleteStatement= DELETE FROM tablePrimary [ [ AS ] alias ] [ WHERE booleanExpression ]
In INSERT and UPSERT statements if the list of target columns is not provided, then the statement must have the same number of columns as the target table and the order of the fields in the VALUES clause must be the same as the order of the fields in the table.
5. SET ISOLATION LEVEL Statement
The current isolation level can be changed within the transaction using the SET ISOLATION LEVEL statement that has the following syntax:
setIsolationLevelStatement= SET ISOLATION LEVEL [TRANSACTION] LOADER | NONE | RAW READ COMMITTED | RRC | READ UNCOMMITTED | READ COMMITTED | RC | SNAPSHOT ISOLATION | SI | SERIALIZABLE | REPEATABLE READ
LeanXcale has multiversion concurrency control that provides highly efficient isolation management between reads and writes. The isolation levels were defined multiversion concurrency control was invented and are based on the notion of locking. LeanXcale supports 4 isolation levels:
-
LOADER (same as NONE). This mode is similar to the NONE standard mode in that it does not have neither concurrency control nor logging. It should be used for massive loads when nobody else is using the database, for instance, a night batch ingesting massive data.
-
RAW READ COMMITTED (same as RRC or READ UNCOMMITTED). This mode is a little more relaxed than READ COMMITTED and allows to read the most current version of any row. It is interesting for applications where one does not want to read from a consistent snapshot in the commit order, but instead the most recent value of the row. When setting READ UNCOMMITTED, this is the mode used that provides more isolation.
-
READ COMMITTED (same as RC). For each SQL statement it uses the most recent consistent snapshot (i.e., longest prefix of committed transactions in commit timestamp order). The equivalent in multiversion concurrency control of locking read committed.
-
SNAPSHOT ISOLATION (same as SI, SERIALIZABLE, REPEATABLE READ, SESSION, SESSION CONSISTENCY). The transaction at start time it gets the current snapshot and performs all reads within the transaction from this snapshot plus the updates performed by the transaction itself. It is roughly similar to SERIALIZABLE.
Note that the optional [TRANSACTION] keyword is just syntactic sugar for compatibility with other SQL dialects in which is mandatory to use it.
6. EXPLAIN Statement
The EXPLAIN statement information about the execution plan of a SQL query. The execution plan describes how the database engine will execute the query and retrieve the required data. This information can be helpful for database administrators, developers, and analysts to optimize and tune their queries for better performance:
explainStatement= EXPLAIN PLAN [ WITH TYPE | WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ] [ EXCLUDING ATTRIBUTES | INCLUDING [ ALL ] ATTRIBUTES ] [ AS JSON | AS XML ] FOR ( queryStatement | insertStatement | updateStatement | deleteStatement )
7. SQL Hints
A hint is an instruction to the optimizer. When writing SQL, you may know information about the data unknown to the optimizer. Hints enable you to force decisions otherwise made by the optimizer.
-
Planner enforcers: there’s no perfect planner, so it makes sense to implement hints to allow user better control the execution. For instance: “never merge this subquery with others” (
/+ no_merge */
). “treat those tables as leading ones” (/+ leading */
) to affect join ordering, etc; -
Append meta data/statistics: some statistics like “table index for scan” or “skew info of some shuffle keys” are somehow dynamic for the query, it would be very convenient to config them with hints because the planning metadata from the planner might not be sufficiently accurate.
-
Operator resource constraints: for many cases, a default resource configuration is assigned for the execution operators, i.e., min parallelism, memory (resource consuming UDF), special resource requirement (GPU or SSD disk), …
LeanXcale supports two kinds of hints:
-
Query Hint: right after the SELECT keyword;
-
Table Hint: right after the referenced table name.
This is the syntax:
query= SELECT /*+ hints */ ... FROM tableName /*+ hints */ JOIN tableName /*+ hints */ ... hints= hintItem[, hintItem ]* hintItem= hintName | hintName(optionKey=optionVal[, optionKey=optionVal ]*) | hintName(hintOption [, hintOption ]*) optionKey= simpleIdentifier | stringLiteral optionVal= stringLiteral hintOption= simpleIdentifier | numericLiteral | stringLiteral
8. EXEC Statement
The EXEC statement enables to invoke a table function. Its syntax is:
execStatement= EXEC '(' [ nonQueryExpression [, nonQueryExpression]* ] ')'
The parameters can be any expression that is not a query returning a value of the parameter type.
9. BATCH Statement
When auto-commit is set, and requesting an insert/update/delete operation from the result of a select statement with a large resultset, one might one instead of performing a large commit of the whole set of updates, a commit for smaller batches of updates. This can be performed with the BATCH statement that changes the implicit behavior of auto-commit. Instead of committing one time at the end of the statement, it will commit every number of indicate updates to commit every number of times indicated in the parameter of the BATCH statement:
batchStatement= BATCH unsignedIntLiteral | BATCH unsignedIntLiteral (sqlInsert | sqlUpdate | sqlDelete)
10. Identifiers
Identifiers are the names of tables, columns and other metadata elements used in an SQL query. Identifiers can be quoted or unquoted. Quoted identifiers, such as "Employee Name", start and end with double quotes. Unquoted identifiers, such as employee, must start with a letter and can only contain letters, digits, and underscores. They are implicitly converted to upper case. Identifiers are case insensitive when unquoted (actually, it is as they were written in CAPS), and case sensitive when quoted. Quoted identifiers may contain virtually any character, including spaces and other punctuation. If you wish to include a double quote in an identifier, use another double quote to escape it, as in this example: "An employee called ""Fred"".". Matching identifiers to the name of the referenced object is case-sensitive.
11. Data Types
11.1. Scalar Data Types
Data type | Description | Range and example literals | Can be stored |
---|---|---|---|
BOOLEAN |
Logical values |
Values: TRUE, FALSE, UNKNOWN |
Yes |
SMALLINT |
2 byte signed integer |
Range is -32768 to 32767 |
Yes |
INTEGER, INT |
4 byte signed integer |
Range is -2147483648 to 2147483647 |
Yes |
BIGINT |
8 byte signed integer |
Range is -9223372036854775808 to 9223372036854775807 |
Yes |
REAL, FLOAT |
4 byte floating point |
6 decimal digits precision |
Yes |
DOUBLE |
8 byte floating point |
15 decimal digits precision |
Yes |
DECIMAL(p, s) |
Fixed point |
Example: 123.45 is a DECIMAL(5, 2) value. |
Yes |
NUMERIC |
Fixed point |
Yes |
|
CHAR(n), CHARACTER(n) |
Fixed-width character string |
‘Hello’, ‘’ (empty string), _latin1’Hello’, n’Hello’, _UTF16’Hello’, ‘Hello’ ‘there’ (literal split into multiple parts) |
Yes |
VARCHAR(n), CHARACTER VARYING(n) |
Variable-length character string |
As CHAR(n) |
Yes |
BINARY(n) |
Fixed-width binary string |
x’45F0AB’, x’’ (empty binary string), x’AB’ ‘CD’ (multi-part binary string literal) |
Yes |
VARBINARY(n), BINARY VARYING(n) |
Variable-length binary string |
As BINARY(n) |
Yes |
DATE |
Date |
Example: DATE ‘1969-07-20’ |
Yes |
TIME |
Time of day |
Example: TIME ‘20:17:40’ |
Yes. This datatype has milliseconds precision |
TIMESTAMP |
Date and time |
Example: TIMESTAMP ‘1969-07-20 20:17:40’ |
Yes. This datatype has microseconds precision |
INTERVAL timeUnit [ TO timeUnit ] |
Date time interval |
Examples: INTERVAL ‘1-5’ YEAR TO MONTH, INTERVAL ‘45’ DAY, INTERVAL ‘1 2:34:56.789’ DAY TO SECOND |
No |
TIMESTAMP WITH TIME ZONE |
Date and time with time zone |
Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’ |
No |
TIMESTAMP WITH LOCAL TIME ZONE |
Date and time with time zone |
Example: TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’ |
No |
GEOMETRY |
Geometry |
Examples: ST_GeomFromText(‘POINT (30 10)’) |
No |
BIGINT ARRAY |
Array of longs |
|
Yes |
DOUBLE ARRAY |
Array of doubles |
|
Yes |
VARCHAR ARRAY |
Array of strings |
|
Yes |
11.2. Non Scalar Types
LeanXcale currently supports the following non scalar types:
Type | Description | Example literals | Can be stored |
---|---|---|---|
ARRAY |
Ordered, contiguous collection that may contain duplicates |
Example: varchar(10) array |
Yes |
MULTISET |
Unordered collection that may contain duplicates |
Example: int multiset |
No |
MAP |
Collection of keys mapped to values |
No |
|
CURSOR |
Cursor over the result of executing a query |
No |
11.2.1. Array Data Type
A sentence to create a table with array types could be the following:
CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR ARRAY, PRIMARY KEY(id));
Or the following, if you want to restrict the number of characters of each VARCHAR position in the VARCHAR ARRAY.
CREATE TABLE tarry (id INT, longs BIGINT ARRAY, doubles DOUBLE ARRAY, strings VARCHAR (20) ARRAY, PRIMARY KEY(id));
An array literal has the following syntax:
arrayLiteral= ARRAY '[' value [, value]* ']'
If we query the above sample table TARRY, the result will be shown this way:
SELECT * FROM tarry +----+------------------------------------------------+------------------------------------------------+-------------------------+ | ID | LONGS | DOUBLES | STRINGS | +----+------------------------------------------------+------------------------------------------------+-------------------------+ | 1 | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] | +----+------------------------------------------------+------------------------------------------------+-------------------------+
In order to insert a NULL value to an array field, can be done in two ways:
-
Replace the NULL keyword for the emptyList() function.
INSERT INTO tarry VALUES (2, ARRAY[2334589765239847563, 0, 44], emptyList(), ARRAY['CUENCA','BILBAO']);
-
Specify only the columns in which we want to insert values. In unlisted columns, NULL values will be inserted.
INSERT INTO tarry (id, longs, strings) VALUES (2, array[2334589765239847563, 0, 44], ARRAY['CUENCA','BILBAO']);
The result will be:
+----+------------------------------------------------+------------------------------------------------+-------------------------+ | ID | LONGS | DOUBLES | STRINGS | +----+------------------------------------------------+------------------------------------------------+-------------------------+ | 1 | [-4367937811447089352, 7, 5575876413993699465] | [0.023533916958532797, 5.476721734666199, 1.0] | [MADRID, MURCIA, JEREZ] | | 2 | [2334589765239847563, 0, 44] | | [CUENCA, BILBAO] | +----+------------------------------------------------+------------------------------------------------+-------------------------+
11.2.2. Non Scalar Value Constructors
The next table summarizes the constructors for non scalar values:
Operator syntax | Description |
---|---|
ROW (value [, value ]*) |
Creates a row from a list of values. |
(value [, value ]* ) |
Creates a row from a list of values. |
MAP ‘[’ key ‘]’ |
Returns the element of a map with a particular key. |
ARRAY ‘[’ index ‘]’ |
Returns the element at a particular location in an array. |
ARRAY ‘[’ value [, value ]* ‘]’ |
Creates an array from a list of values. |
MAP ‘[’ key, value [, key, value ]* ‘]’ |
Creates a map from a list of key-value pairs. |
11.2.3. Non Scalar Functions
The supported functions for non scalar types are shown in the below table:
Operator syntax | Description |
---|---|
ELEMENT(value) |
Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element. |
CARDINALITY(value) |
Returns the number of elements in an array or multiset. |
value MEMBER OF multiset |
Returns whether the value is a member of multiset. |
multiset IS A SET |
Whether multiset is a set (has no duplicates). |
multiset IS NOT A SET |
Whether multiset is not a set (has duplicates). |
multiset IS EMPTY |
Whether multiset contains zero elements. |
multiset IS NOT EMPTY |
Whether multiset contains one or more elements. |
multiset SUBMULTISET OF multiset2 |
Whether multiset is a submultiset of multiset2. |
multiset NOT SUBMULTISET OF multiset2 |
Whether multiset is not a submultiset of multiset2. |
multiset MULTISET UNION [ ALL | DISTINCT ] multiset2 |
Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default). |
multiset MULTISET INTERSECT [ ALL | DISTINCT ] multiset2 |
Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default). |
multiset MULTISET EXCEPT [ ALL | DISTINCT ] multiset2 |
Returns the difference of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default). |
Note: the UNNEST relational operator converts a collection to a relation.
12. Type Conversion
LeanXcale supports both implicit and explicit conversion in expressions.
12.1. Implicit and Explicit Type Conversion
It is recommended to specify explicit conversions, rather than implicit conversions, for these reasons:
-
SQL statements are easier to understand when you use explicit datatype conversion functions.
-
Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.
-
Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR value may return an unexpected format.
12.2. Explicit Type Conversion
Explicit type conversion is attained with the CAST function:
Operator syntax | Description |
---|---|
CAST(value AS type) |
Converts a value to type. |
The supported types in the CAST function are:
type= typeName [ collectionsTypeName ]* typeName= sqlTypeName | rowTypeName | compoundIdentifier sqlTypeName= CHAR [ precision ] [ charSet ] | VARCHAR [ precision ] [ charSet ] | DATE | TIME | TIMESTAMP | GEOMETRY | DECIMAL [ precision [, scale] ] | BOOLEAN | TINYINT | SMALLINT | INTEGER | BIGINT | BINARY [ precision ] | VARBINARY [ precision ] | REAL | DOUBLE | FLOAT | ANY [ precision [, scale] ] collectionsTypeName= ARRAY | MULTISET rowTypeName= ROW '(' fieldName1 fieldType1 [ NULL | NOT NULL ] [ , fieldName2 fieldType2 [ NULL | NOT NULL ] ]* ')' char= CHARACTER | CHAR varchar= char VARYING | VARCHAR decimal= DECIMAL | DEC | NUMERIC integer: INTEGER | INT varbinary= BINARY VARYING | VARBINARY double= DOUBLE [ PRECISION ] time= TIME [ precision ] [ timeZone ] timestamp= TIMESTAMP [ precision ] [ timeZone ] charSet= CHARACTER SET charSetName timeZone= WITHOUT TIME ZONE | WITH LOCAL TIME ZONE
12.3. Implicit Type Conversion
refers to the automatic conversion of one data type to another by the database system without requiring explicit instructions from the programmer. This conversion occurs during the execution of SQL statements or expressions. The following table shows all possible conversions, without regard to the context in which it is made. The following convention is used:
-
i: implicit cast
-
e: explicit cast
-
x: not allowed
FROM - TO | NULL | BOOLEAN | TINYINT | SMALLINT | INT | BIGINT | DECIMAL | FLOAT or REAL | DOUBLE | INTERVAL | DATE | TIME | TIMESTAMP | CHAR or VARCHAR | BINARY or VARBINARY |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NULL |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
BOOLEAN |
x |
i |
e |
e |
e |
e |
e |
e |
e |
x |
x |
x |
x |
i |
x |
TINYINT |
x |
e |
i |
i |
i |
i |
i |
i |
i |
e |
x |
x |
e |
i |
x |
SMALLINT |
x |
e |
i |
i |
i |
i |
i |
i |
i |
e |
x |
x |
e |
i |
x |
INT |
x |
e |
i |
i |
i |
i |
i |
i |
i |
e |
x |
x |
e |
i |
x |
BIGINT |
x |
e |
i |
i |
i |
i |
i |
i |
i |
e |
x |
x |
e |
i |
x |
DECIMAL |
x |
e |
i |
i |
i |
i |
i |
i |
i |
e |
x |
x |
e |
i |
x |
FLOAT/REAL |
x |
e |
i |
i |
i |
i |
i |
i |
i |
x |
x |
x |
e |
i |
x |
DOUBLE |
x |
e |
i |
i |
i |
i |
i |
i |
i |
x |
x |
x |
e |
i |
x |
INTERVAL |
x |
x |
e |
e |
e |
e |
e |
x |
x |
i |
x |
x |
x |
e |
x |
DATE |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
i |
x |
i |
i |
x |
TIME |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
i |
e |
i |
x |
TIMESTAMP |
x |
x |
e |
e |
e |
e |
e |
e |
e |
x |
i |
e |
i |
i |
x |
CHAR or VARCHAR |
x |
e |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
i |
BINARY or VARBINARY |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
e |
e |
e |
i |
i |
12.3.1. Conversion Contexts and Strategies
-
Set operation (
UNION
,EXCEPT
,INTERSECT
): Analyze the data types of each row in every branch and determine the common type for each pair of fields -
Binary arithmetic expression (
+
,-
,&
,^
,/
,%
): promote string operand to data type of the other numeric operand; -
Binary comparison (
=
,<
,⇐
,<>
,>
,>=
): if operands areSTRING
andTIMESTAMP
, promote toTIMESTAMP
; make1 = true
and0 = false
always evaluate toTRUE
. If there is numeric type operand, find common type for both operands. -
IN
sub-query: compare type of left hand side and right hand side, and find the common type. If it is a non scalar type, find the widest type for every field. -
IN
expression list: compare every expression to find the common type. -
CASE WHEN
expression orCOALESCE
: find the common widest type of theTHEN
andELSE
operands. -
Character +
INTERVAL
or character -INTERVAL
: promote character toTIMESTAMP
. -
Built-in function: look up the type families registered in the checker, find the family default type if checker rules allow it.
-
User-defined function (UDF): coerce based on the declared argument types of the
eval()
method. -
INSERT
andUPDATE
: coerce a source field to counterpart target table field’s type if the two fields differ with type name or precision(scale).
Note:
Implicit type coercion of the following cases is ignored:
-
One of the type is
ANY
; -
Type coercion within
CHARACTER
types are always ignored, i.e. fromCHAR(20)
toVARCHAR(30)
; -
Type coercion from a numeric to another with higher precedence is ignored, i.e. from
INT
toLONG
.
12.3.2. Strategies for Finding Common Type
-
If the operator has the expected data types, just take them as the desired one, e.g., the UDF would have
eval()
method which has reflection argument types. -
If there is no expected data type but the data type families are registered, try to coerce the arguments to the family’s default data type, i.e. the String family will have a
VARCHAR
type. -
If neither expected data type nor families are specified, try to find the tightest common type of the node types, i.e.,
INTEGER
andDOUBLE
will returnDOUBLE
, the numeric precision does not lose for this case. -
If no tightest common type is found, try to find a wider type, i.e.
VARCHAR
andINTEGER
will returnINTEGER
, we allow some precision loss when widening decimal to fractional, or promote toVARCHAR
type.
13. Operators and Functions
The following sections provides a summary of all operators and functions supported by LeanXcale.
13.1. Comparison Operators
The following table provides the list of supported comparison operators:
Operator syntax | Description |
---|---|
value1 = value2 |
Equals |
value1 <> value2 |
Not equal |
value1 > value2 |
Greater than |
value1 >= value2 |
Greater than or equal |
value1 < value2 |
Less than |
value1 <= value2 |
Less than or equal |
value IS NULL |
Whether value is null |
value IS NOT NULL |
Whether value is not null |
value1 IS DISTINCT FROM value2 |
Whether two values are not equal, treating null values as the same |
value1 IS NOT DISTINCT FROM value2 |
Whether two values are equal, treating null values as the same |
value1 BETWEEN value2 AND value3 |
(value1 >= value2) AND (value1 <= value3) |
value1 NOT BETWEEN value2 AND value3 |
Whether value1 is less than value2 or greater than value3 |
string1 LIKE patternString [ ESCAPE string3 ] |
Whether string1 matches pattern patternString |
string1 NOT LIKE patternString [ ESCAPE string3 ] |
Whether string1 does not match pattern patternString |
string1 SIMILAR TO patternString [ ESCAPE string3 ] |
Whether string1 matches regular expression patternString |
string1 NOT SIMILAR TO patternString [ ESCAPE string3 ] |
Whether string1 does not match regular expression patternString |
value IN (value [, value]*) |
Whether value is equal to a value in a list |
value NOT IN (value [, value]*) |
Whether value is not equal to any value in a list |
value IN (sub-query) |
Whether value is equal to a row returned by sub-query |
value NOT IN (sub-query) |
Whether value is not equal to any row returned by sub-query |
value comparison SOME (sub-query) |
Whether value fulfills the comparison for at least one row returned by sub-query |
value comparison ANY (sub-query) |
Synonym for SOME |
value comparison ALL (sub-query) |
Whether value fulfills the comparison for every row returned by sub-query |
EXISTS (sub-query) |
Whether sub-query returns at least one row |
13.2. Regular Expressions
LIKE and SIMILAR TO operators allow to perform pattern matching using regular expressions. You can use the special characters ('%','_') and you can also use regular expressions with a syntax similar to perl-like regular expressions. Specifically, regular expressions may contain.
-
char or char sequence: A single character or a sequence of characters that matches itself
-
'.': The dot is a wildcard that matches any single character except for a newline
-
'^': Matches the start of a line
-
'[a-z]', Represents a character class. In this example, it matches any lowercase letter from 'a' to 'z'
-
'[^a-z]': Represents a negated character class. It matches any character that is not a lowercase letter from 'a' to 'z'
-
'\a': Represents a predefined character class for alphanumeric characters. It includes letters (upper and lower case) and digits
-
'\u': Represents a predefined character class for uppercase letters
-
'\l': Represents a predefined character class for lowercase letters
-
'\b': Represents a predefined character class for whitespace characters, such as space and tab
-
'\w': Represents a predefined character class for word characters. It includes letters, digits, and underscore
They may also contain the following operators (from lowest to highest precedence):
-
'(' and ')': for grouping
-
'|': to provide alternatives
-
concatenation: implicit, no operator
-
'*': zero or more times
-
'+': one or more times
-
'?': zero or one time
It should be noted that the main difference between LIKE and SIMILAR TO is that LIKE implies the expression will start with the text in the expression while SIMILAR doesn’t imply '^'.
13.3. Logical Operators
The table below outlines the logical operators:
Operator syntax | Description |
---|---|
boolean1 OR boolean2 |
Whether boolean1 is TRUE or boolean2 is TRUE |
boolean1 AND boolean2 |
Whether boolean1 and boolean2 are both TRUE |
NOT boolean |
Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN |
boolean IS FALSE |
Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN |
boolean IS NOT FALSE |
Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN |
boolean IS TRUE |
Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN |
boolean IS NOT TRUE |
Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN |
boolean IS UNKNOWN |
Whether boolean is UNKNOWN |
boolean IS NOT UNKNOWN |
Whether boolean is not UNKNOWN |
13.4. Mathematical Operators and Functions
This table provides an overview of arithmetic operators and functions:
Operator syntax | Description |
---|---|
+ numeric |
Returns numeric |
- numeric |
Returns negative numeric |
numeric1 + numeric2 |
Returns numeric1 plus numeric2 |
numeric1 - numeric2 |
Returns numeric1 minus numeric2 |
numeric1 * numeric2 |
Returns numeric1 multiplied by numeric2 |
numeric1 / numeric2 |
Returns numeric1 divided by numeric2 |
POWER(numeric1, numeric2) |
Returns numeric1 raised to the power of numeric2 |
ABS(numeric) |
Returns the absolute value of numeric |
MOD(numeric1, numeric2) |
Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative |
SQRT(numeric) |
Returns the square root of numeric |
LN(numeric) |
Returns the natural logarithm (base e) of numeric |
LOG2(numeric) |
Returns the base 2 logarithm of numeric |
LOG10(numeric) |
Returns the base 10 logarithm of numeric |
LOG(numeric1 [, numeric2]) |
Returns the base numeric2 (default is 10) logarithm of numeric1 |
EXP(numeric) |
Returns e raised to the power of numeric |
CEIL(numeric) |
Rounds numeric up, returning the smallest integer that is greater than or equal to numeric |
FLOOR(numeric) |
Rounds numeric down, returning the largest integer that is less than or equal to numeric |
RANDOM() |
Generates a random double between 0 and 1 inclusive |
RANDOM_INT(numeric) |
Generates a random integer between 0 and numeric-1 inclusive |
RANDOM_SEED(bigint) |
Seeds the random generator; valid within the current session |
ACOS(numeric) |
Returns the arc cosine of numeric |
ACOSD(numeric) |
Returns the arc cosine of numeric in degrees |
ACOSH(numeric) |
Returns the hyperbolic arc cosine of numeric |
ASIN(numeric) |
Returns the arc sine of numeric |
ASIND(numeric) |
Returns the arc sine of numeric in degrees |
ASINH(numeric) |
Returns the hyperbolic arc sine of numeric |
ATAN(numeric) |
Returns the arc tangent of numeric |
ATAND(numeric) |
Returns the arc tangent of numeric in degrees |
ATANH(numeric) |
Returns the hyperbolic arc tangent of numeric |
ATAN2(numeric, numeric) |
Returns the arc tangent of the numeric coordinates |
ATAN2D(numeric, numeric) |
Returns the arc tangent of the numeric coordinates in degrees |
CBRT(numeric) |
Returns the cube root of numeric |
COS(numeric) |
Returns the cosine of numeric |
COSD(numeric) |
Returns the cosine of numeric; argument in degrees |
COSH(numeric) |
Returns the hyperbolic cosine of numeric |
COT(numeric) |
Returns the cotangent of numeric |
COTD(numeric) |
Returns the cotangent of numeric; argument in degrees |
DEGREES(numeric) |
Converts numeric from radians to degrees |
E() |
Returns e |
FACTORIAL(bigint) |
Returns the factorial of bigint |
GCD(bigint1, bigint2) |
Returns the greatest common divisor of bigint1 and bigint2 |
LCM(bigint1, bigint2) |
Returns the least common multiple of bigint1 and bigint2 |
PI() |
Returns a value that is closer than any other value to pi |
RADIANS(numeric) |
Converts numeric from degrees to radians |
ROUND(numeric1 [, numeric2]) |
Rounds numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point |
SIGN(numeric) |
Returns the signum of numeric |
SIN(numeric) |
Returns the sine of numeric |
SIND(numeric) |
Returns the sine of numeric; argument in degrees |
SINH(numeric) |
Returns the hyperbolic sine of numeric |
TAN(numeric) |
Returns the tangent of numeric |
TAND(numeric) |
Returns the tangent of numeric; argument in degrees |
TANH(numeric) |
Returns the hyperbolic tangent of numeric |
TRUNCATE(numeric1 [, numeric2]) |
Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point |
13.5. String Operators and Functions
A summary of string operators and functions is depicted in the below table:
Operator syntax | Description |
---|---|
string || string |
Concatenates two character strings |
CONCAT_WS(string0, string1, string2 [, …]) |
Concatenates two or more character strings using string0 as delimiter |
CHAR_LENGTH(string) |
Returns the number of characters in a character string |
CHARACTER_LENGTH(string) |
As CHAR_LENGTH(string) |
UPPER(string) |
Returns a character string converted to upper case |
LOWER(string) |
Returns a character string converted to lower case |
POSITION(string1 IN string2) |
Returns the position of the first occurrence of string1 in string2 |
POSITION(string1 IN string2 FROM integer) |
Returns the position of the first occurrence of string1 in string2 starting at a given point (not standard SQL) |
TRIM( { BOTH | LEADING | TRAILING } string1 FROM string2) |
Removes the longest string containing only the characters in string1 from the start/end/both ends of string1 |
OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ]) |
Replaces a substring of string1 with string2 |
SUBSTRING(string FROM integer) |
Returns a substring of a character string starting at a given point |
SUBSTRING(string FROM integer FOR integer) |
Returns a substring of a character string starting at a given point with a given length |
INITCAP(string) |
Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. |
LEFT(string, integer) |
Returns the first integer characters of string |
RIGHT(string, integer) |
Returns the last integer characters of string |
LPAD(string1, integer [, string2]) |
Fill up string1 to length integer by prepending the characters string2 (a space by default) |
RPAD(string1, integer [, string2]) |
Fill up string1 to length integer by appending the characters string2 (a space by default) |
REVERSE(string) |
Returns string with characters in reverse order |
REPEAT(string, integer) |
Returns string repeated integer number of times |
REPLACE(string1, string2, string3) |
Replace all occurrences in string1 of substring string2 with substring string3 |
SPLIT(string1, string2) |
Split string1 using string2 as delimiter and return the resulting array of strings |
SPLIT_PART(string1, string2, integer) |
Split string1 using string2 as delimiter and return the element with index integer (counting from one) |
STARTS_WITH(string1, string2 [, boolean]) |
Checks whether string1 starts with string2 as prefix; boolean sets case sensitivity (default is true) |
ENDS_WITH(string1, string2 [, boolean]) |
Checks whether string1 ends with string2 as suffix; boolean sets case sensitivity (default is true) |
CONTAINS_SUBSTR(string1, string2 [, boolean]) |
Checks whether string1 contains string2; boolean sets case sensitivity (default is true) |
REGEXP_LIKE(string1, string2) |
Checks whether string1 matches the regular expression string2 |
REGEXP_REPLACE(string1, string2, string3) |
Replaces in string1 each substring that matches the regular expression string2 with string3 |
13.6. Binary String Operators and Functions
The following table provides a summary of binary string operators and functions:
Operator syntax | Description |
---|---|
binary || binary |
Concatenates two binary strings |
POSITION(binary1 IN binary2) |
Returns the position of the first occurrence of binary1 in binary2 |
POSITION(binary1 IN binary2 FROM integer) |
Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL) |
OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ]) |
Replaces a substring of binary1 with binary2 |
SUBSTRING(binary FROM integer) |
Returns a substring of binary starting at a given point |
SUBSTRING(binary FROM integer FOR integer) |
Returns a substring of binary starting at a given point with a given length |
13.7. Operator Precedence
The following table summarizes the operator precedence and associativity, from highest to lowest.
Operator | Associativity |
---|---|
. |
left |
:: |
left |
[ ] (array element) |
left |
+ - (unary plus, minus) |
right |
* / % |
|
left |
|
+ - |
left |
BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS etc. |
- |
< > = ⇐ >= <> != |
left |
IS NULL, IS FALSE, IS NOT TRUE etc. |
- |
NOT |
right |
AND |
left |
OR |
left |
13.8. Date and Time Functions
The below table provides a quick reference of Data and Time Functions:
Operator syntax | Description |
---|---|
LOCALTIME |
Returns the current date and time in the session time zone in a value of datatype TIME |
LOCALTIME(precision) |
Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision |
LOCALTIMESTAMP |
Returns the current date and time in the session time zone in a value of datatype TIMESTAMP |
LOCALTIMESTAMP(precision) |
Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with precision digits of precision |
CURRENT_TIME |
Returns the current time in the session time zone, in a value of datatype TIMESTAMP |
CURRENT_DATE |
Returns the current date in the session time zone, in a value of datatype DATE |
CURRENT_TIMESTAMP |
Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP |
EXTRACT(timeUnit FROM datetime) |
Extracts and returns the value of a specified datetime field from a datetime value expression |
FLOOR(datetime TO timeUnit) |
Rounds datetime down to timeUnit |
CEIL(datetime TO timeUnit) |
Rounds datetime up to timeUnit |
YEAR(date) |
Equivalent to |
QUARTER(date) |
Equivalent to |
MONTH(date) |
Equivalent to |
WEEK(date) |
Equivalent to |
DAYOFYEAR(date) |
Equivalent to |
DAYOFMONTH(date) |
Equivalent to |
DAYOFWEEK(date) |
Equivalent to |
HOUR(date) |
Equivalent to |
MINUTE(date) |
Equivalent to |
SECOND(date) |
Equivalent to |
TIMESTAMPADD(timeUnit, integer, datetime) |
Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to |
TIMESTAMPDIFF(timeUnit, datetime, datetime2) |
Returns the (signed) number of timeUnit intervals between datetime and datetime2. |
LAST_DAY(date) |
Returns the date of the last day of the month in a value of datatype DATE; For example, it returns DATE’2020-02-29’ for both DATE’2020-02-10’ and TIMESTAMP’2020-02-10 10:10:10’ |
CURRENT_MILLIS() |
Returns the number of milliseconds since the epoch |
CURRENT_SECONDS() |
Returns the number of seconds since the epoch |
TO_TIMESTAMP_MILLIS(numeric) |
Returns the timestamp that is numeric number of milliseconds since the epoch |
TO_TIMESTAMP_SECONDS(numeric) |
Returns the timestamp that is numeric number of seconds since the epoch |
TO_EPOCH_MILLIS(timestamp) |
Returns the number of milliseconds from the epoch to timestamp |
TO_EPOCH_SECONDS(timestamp) |
Returns the number of seconds from the epoch to timestamp |
Where:
timeUnit= MILLENNIUM | CENTURY | DECADE | YEAR | QUARTER | MONTH | WEEK | DOY | DOW | DAY | HOUR | MINUTE | SECOND | EPOCH
Note:
-
As stated above, by default, DATE, TIME and TIMESTAMP have no time zone. For those types, there is no implicit time zone, such as UTC (as in Java) or the local time zone. It is left to the user or application to supply a time zone. In turn, TIMESTAMP WITH LOCAL TIME ZONE does not store the time zone internally, times will be stored in UTC and it will rely on the supplied time zone to provide correct semantics.
-
Interval literals may only use time units YEAR, MONTH, DAY, HOUR, MINUTE and SECOND.
====Conditional Functions and Operators
You can find an overview of conditional functions and operators in the following table:
Operator syntax | Description |
---|---|
CASE value |
Enumerated values |
CASE |
Enumerated conditions |
NULLIF(value, value) |
Returns NULL if the values are the same. |
COALESCE(value, value [, value ]*) |
returns the value of the first expression that is not NULL, or NULL otherwise |
13.9. Period Functions
A period is defined by two points in time and can be done in the following ways:
period= (datetime, datetime) | (datetime, interval) | PERIOD (datetime, datetime) | PERIOD (datetime, interval)
There are a number of supported functions over periods:
Operator syntax | Description |
---|---|
period1 CONTAINS datetime dt |
Checks if the time period represented by period1 contains the specified datetime 'dt'. |
period1 CONTAINS period2 |
Verifies if the time period represented by period1 contains the entire time period period2. |
period1 OVERLAPS period2 |
Determines if there is any overlap between the time periods represented by period1 and period2. If there is any common time interval, the condition is true. |
period1 EQUALS period2 |
Checks if the time period represented by period1 is equal to the time period period2 in its entirety. |
period1 PRECEDES period2 |
Validates whether the time period represented by period1 comes before (precedes) the time period period2. It doesn’t necessarily mean immediate succession. |
period1 IMMEDIATELY PRECEDES period2 |
Checks if the time period represented by period1 immediately precedes the time period period2 without any gap in between. |
period1 SUCCEEDS period2 |
Verifies whether the time period represented by period1 comes after (succeeds) the time period period2. Similar to PRECEDES, it doesn’t imply immediate succession. |
period1 IMMEDIATELY SUCCEEDS period2 |
Checks if the time period represented by period1 immediately succeeds the time period period2 without any gap. |
Where period1 and period2 are period expressions.
13.10. Aggregate functions
Syntax:
aggregateCall= agg( [ ALL | DISTINCT ] value [, value ]*) [ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ] [ FILTER (WHERE condition) ] | agg(*) [ FILTER (WHERE condition) ]
where agg is one of the operators in the following table, or a user-defined aggregate function.
If FILTER
is present, the aggregate function only considers rows for which condition evaluates to TRUE.
If DISTINCT
is present, duplicate argument values are eliminated before being passed to the aggregate function.
If WITHIN GROUP
is present, the aggregate function sorts the input rows according to the ORDER BY
clause inside WITHIN GROUP
before aggregating values. WITHIN GROUP
is only allowed for hypothetical set functions (RANK
, DENSE_RANK
, PERCENT_RANK
and CUME_DIST
), inverse distribution functions (PERCENTILE_CONT
and PERCENTILE_DISC
) and collection functions (COLLECT
and LISTAGG
).
In orderItem, if expression is a positive integer n, it denotes the nth item in the SELECT clause. As an example:
SELECT F1, F2, F3, F4 FROM T ORDER BY 1,2
This will order by F1 and then F2 which are the first and second in the SELECT clause.
Operator syntax | Description |
---|---|
COLLECT( [ ALL | DISTINCT ] value) |
Returns a multiset of the values |
LISTAGG( [ ALL | DISTINCT ] value [, separator]) |
Returns values concatenated into a string, delimited by separator (default ‘,’) |
COUNT( [ ALL | DISTINCT ] value [, value ]*) |
Returns the number of input rows for which value is not null (wholly not null if value is composite) |
COUNT(*) |
Returns the number of input rows |
FUSION(multiset) |
Returns the multiset union of multiset across all input values |
APPROX_COUNT_DISTINCT(value [, value ]*) |
Returns the approximate number of distinct values of value; the database is allowed to use an approximation but is not required to |
AVG( [ ALL | DISTINCT ] numeric) |
Returns the average (arithmetic mean) of numeric across all input values |
SUM( [ ALL | DISTINCT ] numeric) |
Returns the sum of numeric across all input values |
MAX( [ ALL | DISTINCT ] value) |
Returns the maximum value of value across all input values |
MIN( [ ALL | DISTINCT ] value) |
Returns the minimum value of value across all input values |
ANY_VALUE( [ ALL | DISTINCT ] value) |
Returns one of the values of value across all input values; this is NOT specified in the SQL standard |
BIT_AND( [ ALL | DISTINCT ] value) |
Returns the bitwise AND of all non-null input values, or null if none |
BIT_OR( [ ALL | DISTINCT ] value) |
Returns the bitwise OR of all non-null input values, or null if none |
BIT_XOR( [ ALL | DISTINCT ] value) |
Returns the bitwise XOR of all non-null input values, or null if none |
STDDEV_POP( [ ALL | DISTINCT ] numeric) |
Returns the population standard deviation of numeric across all input values |
STDDEV_SAMP( [ ALL | DISTINCT ] numeric) |
Returns the sample standard deviation of numeric across all input values |
STDDEV( [ ALL | DISTINCT ] numeric) |
Synonym for |
VAR_POP( [ ALL | DISTINCT ] value) |
Returns the population variance (square of the population standard deviation) of numeric across all input values |
VAR_SAMP( [ ALL | DISTINCT ] numeric) |
Returns the sample variance (square of the sample standard deviation) of numeric across all input values |
COVAR_POP(numeric1, numeric2) |
Returns the population covariance of the pair (numeric1, numeric2) across all input values |
COVAR_SAMP(numeric1, numeric2) |
Returns the sample covariance of the pair (numeric1, numeric2) across all input values |
REGR_COUNT(numeric1, numeric2) |
Returns the number of rows where both dependent and independent expressions are not null |
REGR_SXX(numeric1, numeric2) |
Returns the sum of squares of the dependent expression in a linear regression model |
REGR_SYY(numeric1, numeric2) |
Returns the sum of squares of the independent expression in a linear regression model |
FIRST_IN_ORDER(column1, column2) |
Returns the first value of column1 as ordered by column2 within the aggregate group |
LAST_IN_ORDER(column1, column2) |
Returns the last value of column1 as ordered by column2 within the aggregate group |
13.11. Window Functions
Window functions are a category of SQL functions that operate on a specific "window" or subset of rows within a result set. These functions provide a way to perform calculations across a range of rows that are related to the current row. The syntax is as follows:
windowedAggregateCall= agg( [ ALL | DISTINCT ] value [, value ]*) [ RESPECT NULLS | IGNORE NULLS ] [ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ] [ FILTER (WHERE condition) ] OVER window | agg(*) [ FILTER (WHERE condition) ] OVER window
where agg is one of the operators in the following table, or a user-defined aggregate function.
DISTINCT
, FILTER
, and WITHIN GROUP
behave as described for aggregate functions.
The supported operators are summarized in the following table:
Operator syntax | Description |
---|---|
COUNT(value [, value ]*) OVER window |
Returns the number of rows in window for which value is not null (wholly not null if value is composite) |
COUNT(*) OVER window |
Returns the number of rows in window |
AVG(numeric) OVER window |
Returns the average (arithmetic mean) of numeric across all values in window |
SUM(numeric) OVER window |
Returns the sum of numeric across all values in window |
MAX(value) OVER window |
Returns the maximum value of value across all values in window |
MIN(value) OVER window |
Returns the minimum value of value across all values in window |
RANK() OVER window |
Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer |
DENSE_RANK() OVER window |
Returns the rank of the current row without gaps; this function counts peer groups |
ROW_NUMBER() OVER window |
Returns the number of the current row within its partition, counting from 1 |
FIRST_VALUE(value) OVER window |
Returns value evaluated at the row that is the first row of the window frame |
LAST_VALUE(value) OVER window |
Returns value evaluated at the row that is the last row of the window frame |
LEAD(value, offset, default) OVER window |
Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL |
LAG(value, offset, default) OVER window |
Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL |
NTH_VALUE(value, nth) OVER window |
Returns value evaluated at the row that is the nth row of the window frame |
NTILE(value) OVER window |
Returns an integer ranging from 1 to value, dividing the partition as equally as possible |
Note:
-
You may specify null treatment (
IGNORE NULLS
,RESPECT NULLS
) forFIRST_VALUE
,LAST_VALUE
,NTH_VALUE
,LEAD
andLAG
functions. The syntax handled by the parser, but onlyRESPECT NULLS
is implemented at runtime.
13.12. Grouping Functions
LeanXcale supports the following grouping functions:
Operator syntax | Description |
---|---|
GROUPING(expression [, expression ]*) |
Returns a bit vector of the given grouping expressions |
GROUP_ID() |
Returns an integer that uniquely identifies the combination of grouping keys |
GROUPING_ID(expression [, expression ]*) |
Synonym for |
13.13. Grouped Window Functions
Grouped window functions occur in the GROUP BY
clause and define a key value that represents a window containing several rows.
13.13.1. CTUMBLE
In streaming queries, TUMBLE assigns a window for each row of a relation based on a timestamp column. An assigned window is specified by its beginning and ending. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”. CTUMBLE function works in a similar way as TUMBLE does, but for regular tables.
Operator syntax | Description |
---|---|
CTUMBLE(datetime COLUMN, interval [, time ]) |
Indicates a tumbling window of interval for datetime, optionally aligned at time |
Here is an example:
SELECT CAST(CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000') AS TIMESTAMP) AS GTIMEST,
keyid,
SUM(metric1) AS SUMM1,
SUM(metric2) AS SUMM2,
FROM TIMEMETRICS
WHERE keyid <= '1'
GROUP BY CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000'), keyid
ORDER BY 1,2
In the query above you will be getting the addition of the metrics for every key and for every hour (60 minute) interval. The aligned at value is causing the intervals to start at minute 01
The result obtained would look similar to:
GTIMEST | KEYID | SUMM1 | SUMM2 |
---|---|---|---|
2020-09-27 17:01:00 |
0 |
10 |
2.5 |
2020-09-27 17:01:00 |
1 |
10 |
2.5 |
2020-09-27 18:01:00 |
0 |
20 |
5.0 |
2020-09-27 18:01:00 |
1 |
20 |
5.0 |
2020-09-27 19:01:00 |
0 |
30 |
7.5 |
2020-09-27 19:01:00 |
1 |
30 |
7.5 |
13.13.2. Grouped Auxiliary Functions
Grouped auxiliary functions allow you to access properties of a window defined by a grouped window function:
Operator syntax | Description |
---|---|
HOP_END(expression, slide, size [, time ]) |
Returns the value of expression at the end of the window defined by a |
HOP_START(expression, slide, size [, time ]) |
Returns the value of expression at the beginning of the window defined by a |
SESSION_END(expression, interval [, time]) |
Returns the value of expression at the end of the window defined by a |
SESSION_START(expression, interval [, time]) |
Returns the value of expression at the beginning of the window defined by a |
TUMBLE_END(expression, interval [, time ]) |
Returns the value of expression at the end of the window defined by a |
TUMBLE_START(expression, interval [, time ]) |
Returns the value of expression at the beginning of the window defined by a |
13.14. Time Series Functions: GAPFILL, INTERPOLATE, and LOCF
The GAPFILL operator can be used as a top-level function in a SELECT statement. It operates on timestamp expressions, aligned at a particular time unit, to fill missing values so that the result of the SELECT statement has no gaps with respect to the time unit. The values can then be filled with nulls, interpolated, or repeated. The input of the GAPFILL function is typically a time bucketing aggregate that aligns timestamps to a particular time unit, such as the functions FLOOR or CTUMBLE. Example:
create table abc(a integer primary key, b timestamp, c double);
insert into abc values(1, timestamp'2024-07-11 03:00:00', 10);
insert into abc values(2, timestamp'2024-07-11 03:00:03', 30);
select GAPFILL(ctumble(b, interval '1' second)), avg(c) as c
from abc
group by ctumble(b, interval '1' second);
Its signature is:
GAPFILL(ts timestamp [, start_ts timestamp, end_ts timestamp]) RETURNS timestamp
Note that the call to the GAPFILL function should be placed at the top level, i.e., in the SELECT list and not in the GROUP BY expression. This adds two values for the missing timestamps at 03:00:01 and 03:00:02, without filling the values at column c:
2024-07-11 03:00:00.0 10 2024-07-11 03:00:01.0 2024-07-11 03:00:02.0 2024-07-11 03:00:03.0 30
Alternatively, one can choose to fill the gaps with interpolated values or to repeat values by applying the INTERPOLATE or LOCF functions on the aggregated column:
select GAPFILL(ctumble(b, interval '1' second)),
INTERPOLATE(avg(c))as c_interp, LOCF(avg(c)) as c_rep
from abc
group by ctumble(b, interval '1' second);
The output would be:
2024-07-11 03:00:00.0 10 10 2024-07-11 03:00:01.0 16.667 10 2024-07-11 03:00:02.0 23.333 10 2024-07-11 03:00:03.0 30 30
Their signatures are:
INTERPOLATE(a numeric) RETURNS numeric LOCF(a numeric) RETURNS numeric
The CTUMBLE function can work with intervals that are arbitrary multiples of a millisecond time unit, i.e. milliseconds, seconds, minutes, hours, and days. The FLOOR function can also be used for time bucketing, only on single time units, but can be used with month and year time units. For example:
-- insert one more row for another month
insert into abc
values(3, timestamp'2024-09-11 03:00:03', 30);
select GAPFILL(floor(b to month)), locf(avg(c))
from abc
group by floor(b to month);
Would yield as output:
2024-07-01 00:00:00.0 20 2024-08-01 00:00:00.0 20 2024-09-01 00:00:00.0 30
And if the result should contain timestamps beyond the range defined by the underlying data, the range bounds can be specified as additional arguments to the gapfill function call:
select GAPFILL(floor(b to month), timestamp'2024-06-01 00:00:00',
timestamp'2024-10-01 00:00:00'), locf(avg(c))
from abc
group by floor(b to month);
Whose output is:
2024-06-01 00:00:00.0 2024-07-01 00:00:00.0 20 2024-08-01 00:00:00.0 20 2024-09-01 00:00:00.0 30 2024-10-01 00:00:00.0 30
13.15. Approximate/Statistical Aggregate Functions
In this section we discuss a set of functions that are designed for estimating the cardinality of large datasets, frequency of distinct elements, percentiles, etc. with reduced memory usage and improved performance compared to exact methods.
13.15.1. HyperLogLog functions: HLL, DISTINCT_COUNT, and RELATIVE_ERROR
HyperLogLog (HLL) applies a probabilistic approach to approximate distinct values count using the HyperLogLog++ algorithm. For large datasets and datasets with high cardinality (many distinct values), this can be much more efficient in both CPU and memory than an exact count. HLL is an aggregate function, which returns an intermediate structure, that can be used to extract the distinct values count or be further involved in another HLL aggregation. The optional buckets parameter specifies the number of buckets for the HyperLogLog algorithm; it must be a value between 16 and 262144; if omitted, a default value of 4096 is applied.
Example:
-- approximates select count(distinct b) from abc; with error err
-- data contains 150000 distinct values; the result is approximately 150000
select distinct_count(hll(b)) as cnt,
relative_error(hll(b)) as err
from abc;
-- aggregates on intermediate HLL structures
-- performs two-level aggregation on overlapping partitions of the data, each of 100000 distinct values
-- the result is still approximately 150000
select distinct_count(hll(h)) as cnt,
relative_error(hll(h)) as err
from (
select hll(b) as h from abc where a <= 100000
union all
select hll(b) as h from abc where a > 50000
);
The signatures of the functions are:
HLL(a any [, buckets integer]) RETURNS hll DISTINCT_COUNT(h hll) RETURNS bigint RELATIVE_ERROR(h hll) RETURNS double HLL(h hll) RETURNS hll
13.15.2. Approximate Percentiles with UDDSketch: UDDS, APPROX_PERCENTILE, APPROX_PERCENTILE_RANK, etc.
UDDSketch estimates the value at a given percentile, or the percentile rank of a given value, using the UddSketch algorithm. This estimation is more memory- and CPU-efficient than an exact calculation. UDDS is an aggregate function, which returns an intermediate structure, that can be used to extract percentiles or ranks or be further involved in another UDDS aggregation. If optional parameters buckets and maxerr are omitted, default values of 200 buckets and 0.001 for the maximal error are applied.
Example:
-- approximates the value at the 75-th percentile and the rank (between 0 and 1) of value 13
select approx_percentile(udds(b), 0.75),
approx_percentile_rank(udds(b), 13)
from abc;
-- aggregates on intermediate UDDS structures; returns the same as the above query
select approx_percentile(udds(u), 0.75),
approx_percentile_rank(udds(u), 13)
from (
select udds(b) as u from abc where a < 10
union all
select udds(b) as u from abc where a >= 10
);
The signatures are:
UDDS(a numeric [, buckets integer, maxerr double]) RETURNS udds APPROX_PERCENTILE(u udds, p double) RETURNS double APPROX_PERCENTILE_ARRAY(u udds, p double[]) RETURNS double[] APPROX_PERCENTILE_RANK(u udds, v double) RETURNS double UDDS_ERROR(u udds) RETURNS double UDDS_MEAN(u udds) RETURNS double UDDS_NVALS(u udds) RETURNS integer UDDS(u udds) RETURNS udds
13.15.3. Statistical Aggregate Functions: STATS, STATS_MEAN, STATS_STDDEV, STATS_SKEWNESS, STATS_KURTOSIS, STATS_SUM, STATS_COUNT
STATS is an aggregate function, which performs common statistical analysis into an intermediate structure, that can be used to extract statistical metrics or be further involved in another STATS aggregation.
Example:
-- returns the stddev, kurtosis, and skewness of values in column b
select stats_stddev(stats(b)),
stats_kurtosis(stats(b)),
stats_skewness(stats(b))
from abc;
-- aggregates on intermediate STATS structures; returns the same as the above query
select stats_stddev(stats(s)),
stats_kurtosis(stats(s)),
stats_skewness(stats(s))
from (
select stats(b) as s from abc where a < 10
union all
select stats(b) as s from abc where a >= 10
);
The signature of the functions are:
STATS(a numeric) RETURNS stats STATS_MEAN(s stats) RETURNS double STATS_STDDEV(s stats) RETURNS double STATS_SKEWNESS(s stats) RETURNS double STATS_KURTOSIS(s stats) RETURNS double STATS_SUM(s stats) RETURNS double STATS_COUNT(s stats) RETURNS bigint STATS(s stats) RETURNS stats
13.16. JSON Functions
LeanXcale also supports a number of functions to operate over strings containing JSON values. We use the following convention in the following subsections:
-
jsonValue is a character string containing a JSON value;
-
path is a character string containing a JSON path expression; mode flag
strict
orlax
should be specified in the beginning of path.
13.16.1. JSON Query Functions
These are the query function predicates over JSON values:
Operator syntax | Description |
---|---|
JSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ] ) |
Whether a jsonValue satisfies a search criterion described using JSON path expression path |
JSON_VALUE(jsonValue, path [ RETURNING type ] [ { ERROR | NULL | DEFAULT expr } ON EMPTY ] [ { ERROR | NULL | DEFAULT expr } ON ERROR ] ) |
Extract an SQL scalar from a jsonValue using JSON path expression path |
JSON_QUERY(jsonValue, path [ { WITHOUT [ ARRAY ] | WITH [ CONDITIONAL | UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON ERROR ] ) |
Extract a JSON object or JSON array from jsonValue using the path JSON path expression |
JSON_HAS_VALUE(jsonValue1, jsonValue2) |
Whether jsonValue1 contains jsonValue2 |
Note:
-
The
ON ERROR
andON EMPTY
clauses define the fallback behavior of the function when an error is thrown or a null value is about to be returned. -
The
ARRAY WRAPPER
clause defines how to represent a JSON array result inJSON_QUERY
function. The following examples compare the wrapper behaviors.
Sample Data:
{"a": "[1,2]", "b": [1,2], "c": "hi"}
These are the JSON query functions: Comparison:
Operator | $.a | $.b | $.c |
---|---|---|---|
JSON_VALUE |
[1, 2] |
error |
hi |
JSON QUERY WITHOUT ARRAY WRAPPER |
error |
[1, 2] |
error |
JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER |
[ “[1,2]” ] |
[ [1,2] ] |
[ “hi” ] |
JSON QUERY WITH CONDITIONAL ARRAY WRAPPER |
[ “[1,2]” ] |
[1,2] |
[ “hi” ] |
13.16.2. JSON Constructor Functions
These are the JSON value constructor functions:
Operator syntax | Description |
---|---|
JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] ) |
Construct JSON object using a series of key (name) value (value) pairs |
JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } [ { NULL | ABSENT } ON NULL ] ) |
Aggregate function to construct a JSON object using a key (name) value (value) pair |
JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] ) |
Construct a JSON array using a series of values (value) |
JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] ) |
Aggregate function to construct a JSON array using a value (value) |
Note:
-
The flag
FORMAT JSON
indicates the value is formatted as JSON character string. WhenFORMAT JSON
is used, the value should be de-parse from JSON character string to a SQL structured value. -
ON NULL
clause defines how the JSON output represents null values. The default null behavior ofJSON_OBJECT
andJSON_OBJECTAGG
isNULL ON NULL
, and forJSON_ARRAY
andJSON_ARRAYAGG
it isABSENT ON NULL
. -
If
ORDER BY
clause is provided,JSON_ARRAYAGG
sorts the input rows into the specified order before performing aggregation.
13.16.3. JSON Validity Operators
The following functions enable to check the validity of a JSON value:
Operator syntax | Description |
---|---|
jsonValue IS JSON [ VALUE ] |
Whether jsonValue is a JSON value |
jsonValue IS NOT JSON [ VALUE ] |
Whether jsonValue is not a JSON value |
jsonValue IS JSON SCALAR |
Whether jsonValue is a JSON scalar value |
jsonValue IS NOT JSON SCALAR |
Whether jsonValue is not a JSON scalar value |
jsonValue IS JSON OBJECT |
Whether jsonValue is a JSON object |
jsonValue IS NOT JSON OBJECT |
Whether jsonValue is not a JSON object |
jsonValue IS JSON ARRAY |
Whether jsonValue is a JSON array |
jsonValue IS NOT JSON ARRAY |
Whether jsonValue is not a JSON array |
13.16.4. JSON Auxiliary Functions
These are some JSON auxiliary functions:
Operator syntax | Description |
---|---|
JSON_TYPE(jsonValue) |
Returns a string value indicating the type of a jsonValue |
JSON_DEPTH(jsonValue) |
Returns an integer value indicating the depth of a jsonValue |
JSON_PRETTY(jsonValue) |
Returns a pretty-printing of jsonValue |
JSON_LENGTH(jsonValue [, path ]) |
Returns a integer indicating the length of jsonValue |
JSON_KEYS(jsonValue [, path ]) |
Returns a string indicating the keys of a JSON jsonValue |
JSON_REMOVE(jsonValue, path[, path]) |
Removes data from jsonValue using a series of path expressions and returns the result |
Note:
-
JSON_TYPE
/JSON_DEPTH
/JSON_PRETTY
return null if the argument is null -
JSON_LENGTH
/JSON_KEYS
/JSON_REMOVE
return null if the first argument is null -
JSON_TYPE
generally returns an upper-case string flag indicating the type of the JSON input. Currently supported supported type flags are:-
INTEGER
-
STRING
-
FLOAT
-
DOUBLE
-
LONG
-
BOOLEAN
-
DATE
-
OBJECT
-
ARRAY
-
NULL
-
-
JSON_DEPTH
defines a JSON value’s depth as follows:-
An empty array, empty object, or scalar value has depth 1;
-
A non-empty array containing only elements of depth 1 or non-empty object containing only member values of depth 1 has depth 2;
-
Otherwise, a JSON document has depth greater than 2.
-
-
JSON_LENGTH
defines a JSON value’s length as follows:-
A scalar value has length 1;
-
The length of array or object is the number of elements is contains.
-
Usage Examples:
13.16.5. JSON_TYPE Example
This statement:
SELECT JSON_TYPE(v) AS c1, JSON_TYPE(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2, JSON_TYPE(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3, JSON_TYPE(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4 FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v) LIMIT 10;
returns:
c1 | c2 | c3 | c4 |
---|---|---|---|
OBJECT |
ARRAY |
INTEGER |
BOOLEAN |
13.16.6. JSON_DEPTH Example
This statement returns:
SELECT JSON_DEPTH(v) AS c1, JSON_DEPTH(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2, JSON_DEPTH(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3, JSON_DEPTH(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4 FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v) LIMIT 10;
returns:
c1 | c2 | c3 | c4 |
---|---|---|---|
3 |
2 |
1 |
1 |
13.16.7. JSON_LENGTH Example
This statement:
SELECT JSON_LENGTH(v) AS c1, JSON_LENGTH(v, 'lax $.a') AS c2, JSON_LENGTH(v, 'strict $.a[0]') AS c3, JSON_LENGTH(v, 'strict $.a[1]') AS c4 FROM (VALUES ('{"a": [10, true]}')) AS t(v) LIMIT 10;
returns:
c1 | c2 | c3 | c4 |
---|---|---|---|
1 |
2 |
1 |
1 |
13.16.8. JSON_KEYS Example
This statement:
SELECT JSON_KEYS(v) AS c1, JSON_KEYS(v, 'lax $.a') AS c2, JSON_KEYS(v, 'lax $.b') AS c2, JSON_KEYS(v, 'strict $.a[0]') AS c3, JSON_KEYS(v, 'strict $.a[1]') AS c4 FROM (VALUES ('{"a": [10, true],"b": {"c": 30}}')) AS t(v) LIMIT 10;
returns:
c1 | c2 | c3 | c4 | c5 |
---|---|---|---|---|
[“a”, “b”] |
NULL |
[“c”] |
NULL |
NULL |
13.16.9. JSON_REMOVE Example
This statement:
SELECT JSON_REMOVE(v, '$[1]') AS c1 FROM (VALUES ('["a", ["b", "c"], "d"]')) AS t(v) LIMIT 10;
returns:
c1 |
---|
[“a”, “d”] |
13.16.10. JSON_STORAGE_SIZE Example
This statement:
SELECT JSON_STORAGE_SIZE('[100, \"sakila\", [1, 3, 5], 425.05]') AS c1, JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"a\", \"c\": \"[1, 3, 5, 7]\"}') AS c2, JSON_STORAGE_SIZE('{\"a\": 10, \"b\": \"xyz\", \"c\": \"[1, 3, 5, 7]\"}') AS c3, JSON_STORAGE_SIZE('[100, \"json\", [[10, 20, 30], 3, 5], 425.05]') AS c4 LIMIT 10;
returns:
c1 | c2 | c3 | c4 |
---|---|---|---|
29 |
35 |
37 |
36 |
13.16.11. DECODE Example
This statement:
SELECT DECODE(f1, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c1, DECODE(f2, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c2, DECODE(f3, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c3, DECODE(f4, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c4, DECODE(f5, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c5 FROM (VALUES (1, 2, 3, 4, 5)) AS t(f1, f2, f3, f4, f5);
returns:
c1 | c2 | c3 | c4 | c5 |
---|---|---|---|---|
aa |
bb |
cc |
dd |
ee |
13.16.12. TRANSLATE example
This statement:
SELECT TRANSLATE('Aa*Bb*Cc''D*d', ' */''%', '_') as c1, TRANSLATE('Aa/Bb/Cc''D/d', ' */''%', '_') as c2, TRANSLATE('Aa Bb Cc''D d', ' */''%', '_') as c3, TRANSLATE('Aa%Bb%Cc''D%d', ' */''%', '_') as c4 FROM (VALUES (true)) AS t(f0);
returns:
c1 | c2 | c3 | c4 |
---|---|---|---|
Aa_Bb_CcD_d |
Aa_Bb_CcD_d |
Aa_Bb_CcD_d |
Aa_Bb_CcD_d |
14. User-Defined Functions
LeanXcale is extensible. One can define new functions using user code. There are several ways to define a function.
To implement a scalar function, there are 3 options:
-
Create a class with a public static
eval
method, and register the class; -
Create a class with a public non-static
eval
method, and a public constructor with no arguments, and register the class; -
Create a class with one or more public static methods, and register each class/method combination.
To implement an aggregate function, there are 2 options:
-
Create a class with public static
init
,add
andresult
methods, and register the class; -
Create a class with public non-static
init
,add
andresult
methods, and a public constructor with no arguments, and register the class.
Optionally, add a public merge
method to the class; this allows Calcite to generate code that merges sub-totals.
Optionally, make your class implement the SqlSplittableAggFunction interface; this allows Calcite to decompose the function across several stages of aggregation, roll up from summary tables, and push it through joins.
To implement a table function, there are 3 options:
-
Create a class with a static
eval
method that returns ScannableTable or QueryableTable, and register the class; -
Create a class with a non-static
eval
method that returns ScannableTable or QueryableTable, and register the class; -
Create a class with one or more public static methods that return ScannableTable or QueryableTable, and register each class/method combination.
To implement a table macro, there are 3 options:
-
Create a class with a static
eval
method that returns TranslatableTable, and register the class; -
Create a class with a non-static
eval
method that returns TranslatableTable, and register the class; -
Create a class with one or more public static methods that return TranslatableTable, and register each class/method combination.
LeanXcale infers the parameter types and result type of a function from the parameter and return types of the Java method that implements it. Further, you can specify the name and optionality of each parameter using the Parameter annotation.
14.1. Calling Functions with Named and Optional Parameters
Usually when you call a function, you need to specify all of its parameters, in order. But that can be a problem if a function has a lot of parameters, and especially if you want to add more parameters over time.
To solve this problem, the SQL standard allows you to pass parameters by name, and to define parameters which are optional (that is, have a default value that is used if they are not specified).
Suppose you have a function f
, declared as in the following pseudo syntax:
FUNCTION f( INTEGER a, INTEGER b DEFAULT NULL, INTEGER c, INTEGER d DEFAULT NULL, INTEGER e DEFAULT NULL) RETURNS INTEGER
All of the function’s parameters have names, and parameters b
, d
and e
have a default value of NULL
and are therefore optional.
In LeanXcale NULL is the only allowable default value for optional parameters.
When calling a function with optional parameters, you can omit optional arguments
at the end of the list,
or use the DEFAULT
keyword for any optional arguments. Here are some examples:
-
f(1, 2, 3, 4, 5)
provides a value to each parameter, in order; -
f(1, 2, 3, 4)
omitse
, which gets its default value, NULL; -
f(1, DEFAULT, 3)
omitsd
ande
, and specifies to use the default value ofb
; -
f(1, DEFAULT, 3, DEFAULT, DEFAULT)
has the same effect as the previous example; -
f(1, 2)
is not legal, becausec
is not optional; -
f(1, 2, DEFAULT, 4)
is not legal, becausec
is not optional.
You can specify arguments by name using the =>
syntax.
If one argument is named, they all must be.
Arguments may be in any other, but must not specify any argument more than once,
and you need to provide a value
for every parameter which is not optional. Here are some examples:
-
f(c => 3, d => 1, a => 0)
is equivalent tof(0, NULL, 3, 1, NULL)
; -
f(c => 3, d => 1)
is not legal, because you have not specified a value fora
anda
is not optional.
15. Pattern Matching of CEP Events with MATCH_RECOGNIZE
MATCH_RECOGNIZE is a SQL extension for recognizing sequences of events in complex event processing (CEP).
The syntax is as follows:
matchRecognize= MATCH_RECOGNIZE '(' [ PARTITION BY expression [, expression ]* ] [ ORDER BY orderItem [, orderItem ]* ] [ MEASURES measureColumn [, measureColumn ]* ] [ ONE ROW PER MATCH | ALL ROWS PER MATCH ] [ AFTER MATCH ( SKIP TO NEXT ROW | SKIP PAST LAST ROW | SKIP TO FIRST variable | SKIP TO LAST variable | SKIP TO variable ) ] PATTERN '(' pattern ')' [ WITHIN intervalLiteral ] [ SUBSET subsetItem [, subsetItem ]* ] DEFINE variable AS condition [, variable AS condition ]* ')' subsetItem= variable = '(' variable [, variable ]* ')' measureColumn= expression AS alias pattern= patternTerm [ '|' patternTerm ]* patternTerm= patternFactor [ patternFactor ]* patternFactor= patternPrimary [ patternQuantifier ] patternPrimary= variable | '$' | '^' | '(' [ pattern ] ')' | '{-' pattern '-}' | PERMUTE '(' pattern [, pattern ]* ')' patternQuantifier= '*' | '*?' | '+' | '+?' | '?' | '??' | '{' { [ minRepeat ], [ maxRepeat ] } '}' ['?'] | '{' repeat '}' intervalLiteral= INTERVAL 'string' timeUnit [ TO timeUnit ]
In patternQuantifier, repeat is a positive integer, and minRepeat and maxRepeat are non-negative integers.
16. Copying Data Between Two Tables
Currently, the syntax CREATE TABLE … AS SELECT … is not supported. In order to copy data from one table to another, one first need to create the new table and then do INSERT/UPSERT … SELECT. This is an example:
CREATE TABLE TDEST (f1_key BIGINT, f2_value VARBINARY, PRIMARY KEY(f1_key)); INSERT INTO TDEST SELECT * FROM TORIG;
Additionally, the WITH TEMPORAL AS clause for using a temporal table can be used as in the following example:
CREATE TABLE Persons ( ID int NOT NULL, NAME VARCHAR(20) NOT NULL, AGE int NOT NULL, SALARY double NOT NULL, PRIMARY KEY (ID) ); INSERT INTO Persons (WITH TEMPORAL AS (SELECT 1, 'Juan', 10, cast(1550.1 as double)) SELECT * FROM temporal);
17. JDBC Function Escapes
JDBC function escapes are a mechanism provided by the JDBC API to invoke database-specific functions and procedures using standard SQL syntax within a Java application. JDBC function escapes are useful to use functions in Java code.
The syntax for JDBC function escapes involves using curly braces {} to enclose the call to the database-specific functions. The format typically looks like {fn functionName(argument1, argument2, …)}. The following sections provide the different functions for different data types.
17.1. Numeric
These are the Numeric JDBC function escapes:
Operator syntax | Description |
---|---|
{fn ABS(numeric)} |
Returns the absolute value of numeric |
{fn ACOS(numeric)} |
Returns the arc cosine of numeric |
{fn ASIN(numeric)} |
Returns the arc sine of numeric |
{fn ATAN(numeric)} |
Returns the arc tangent of numeric |
{fn ATAN2(numeric, numeric)} |
Returns the arc tangent of the numeric coordinates |
{fn CBRT(numeric)} |
Returns the cube root of numeric |
{fn CEILING(numeric)} |
Rounds numeric up, and returns the smallest number that is greater than or equal to numeric |
{fn COS(numeric)} |
Returns the cosine of numeric |
{fn COT(numeric)} |
Returns the cotangent of numeric |
{fn DEGREES(numeric)} |
Converts numeric from radians to degrees |
{fn EXP(numeric)} |
Returns e raised to the power of numeric |
{fn FLOOR(numeric)} |
Rounds numeric down, and returns the largest number that is less than or equal to numeric |
{fn LOG(numeric)} |
Returns the natural logarithm (base e) of numeric |
{fn LOG10(numeric)} |
Returns the base-10 logarithm of numeric |
{fn MOD(numeric1, numeric2)} |
Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative |
{fn PI()} |
Returns a value that is closer than any other value to pi |
{fn POWER(numeric1, numeric2)} |
Returns numeric1 raised to the power of numeric2 |
{fn RADIANS(numeric)} |
Converts numeric from degrees to radians |
{fn RAND(numeric)} |
Returns a random double using numeric as the seed value |
{fn ROUND(numeric1, numeric2)} |
Rounds numeric1 to numeric2 places right to the decimal point |
{fn SIGN(numeric)} |
Returns the signum of numeric |
{fn SIN(numeric)} |
Returns the sine of numeric |
{fn SQRT(numeric)} |
Returns the square root of numeric |
{fn TAN(numeric)} |
Returns the tangent of numeric |
{fn TRUNCATE(numeric1, numeric2)} |
Truncates numeric1 to numeric2 places right to the decimal point |
17.2. String
These are the String JDBC function escapes:
Operator syntax | Description |
---|---|
{fn ASCII(string)} |
Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty |
{fn CONCAT(character, character)} |
Returns the concatenation of character strings |
{fn INSERT(string1, start, length, string2)} |
Inserts string2 into a slot in string1 |
{fn LCASE(string)} |
Returns a string in which all alphabetic characters in string have been converted to lower case |
{fn LENGTH(string)} |
Returns the number of characters in a string |
{fn LOCATE(string1, string2 [, integer])} |
Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified. |
{fn LEFT(string, length)} |
Returns the leftmost length characters from string |
{fn LTRIM(string)} |
Returns string with leading space characters removed |
{fn REPLACE(string, search, replacement)} |
Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed |
{fn REVERSE(string)} |
Returns string with the order of the characters reversed |
{fn RIGHT(string, integer)} |
Returns the rightmost length characters from string |
{fn RTRIM(string)} |
Returns string with trailing space characters removed |
{fn SUBSTRING(string, offset, length)} |
Returns a character string that consists of length characters from string starting at the offset position |
{fn UCASE(string)} |
Returns a string in which all alphabetic characters in string have been converted to upper case |
17.3. Date/time
These are the DATE and TIME JDBC function escapes:
Operator syntax | Description |
---|---|
{fn CURDATE()} |
Equivalent to |
{fn CURTIME()} |
Equivalent to |
{fn NOW()} |
Equivalent to |
{fn YEAR(date)} |
Equivalent to |
{fn QUARTER(date)} |
Equivalent to |
{fn MONTH(date)} |
Equivalent to |
{fn WEEK(date)} |
Equivalent to |
{fn DAYOFYEAR(date)} |
Equivalent to |
{fn DAYOFMONTH(date)} |
Equivalent to |
{fn DAYOFWEEK(date)} |
Equivalent to |
{fn HOUR(date)} |
Equivalent to |
{fn MINUTE(date)} |
Equivalent to |
{fn SECOND(date)} |
Equivalent to |
{fn TIMESTAMPADD(timeUnit, count, datetime)} |
Adds an interval of count timeUnits to a datetime |
{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)} |
Subtracts timestamp1 from timestamp2 and returns the result in timeUnits |