SQL language

This page describes the SQL dialect (Calcite) recognized by LeanXcale default SQL parser. Query engine from LeanXcale forked from Apache Calcite project so there are a lot of points in common with respect to grammar, though there have been a few major changes regarding transaction management, DML extenstions and optimization for the use of LeanXcales KiVi datastore.

Grammar

SQL grammar in BNF-like form.

statement:
      setStatement
  |   resetStatement
  |   explain
  |   describe
  |   insert
  |   update
  |   merge
  |   delete
  |   query

statementList:
      statement [ ';' statement ]* [ ';' ]

explain:
      EXPLAIN PLAN
      [ WITH TYPE | WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ]
      [ EXCLUDING ATTRIBUTES | INCLUDING [ ALL ] ATTRIBUTES ]
      [ AS JSON | AS XML ]
      FOR ( query | insert | update | delete )

insert:
      ( INSERT | UPSERT ) INTO tablePrimary
      [ '(' column [, column ]* ')' ]
      query

update:
      UPDATE tablePrimary
      SET assign [, assign ]*
      [ WHERE booleanExpression ]

assign:
      identifier '=' expression

delete:
      DELETE FROM tablePrimary [ [ AS ] alias ]
      [ WHERE booleanExpression ]

query:
      values
  |   WITH withItem [ , withItem ]* query
  |   {
          select
      |   selectWithoutFrom
      |   query UNION [ ALL | DISTINCT ] query
      |   query EXCEPT [ ALL | DISTINCT ] query
      |   query INTERSECT [ ALL | DISTINCT ] query
      }
      [ ORDER BY orderItem [, orderItem ]* ]
      [ LIMIT { count | ALL } ]
      [ OFFSET start { ROW | ROWS } ]
      [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]

withItem:
      name
      [ '(' column [, column ]* ')' ]
      AS '(' query ')'

orderItem:
      expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]

select:
      SELECT [ ALL | DISTINCT ]
          { * | projectItem [, projectItem ]* }
      FROM tableExpression
      [ WHERE booleanExpression ]
      [ GROUP BY { groupItem [, groupItem ]* } ]
      [ HAVING booleanExpression ]
      [ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ]

selectWithoutFrom:
      SELECT [ ALL | DISTINCT ]
          { * | projectItem [, projectItem ]* }

projectItem:
      expression [ [ AS ] columnAlias ]
  |   tableAlias . *

tableExpression:
      tableReference [, tableReference ]*
  |   tableExpression [ NATURAL ] [ ( LEFT | RIGHT | FULL ) [ OUTER ] ] JOIN tableExpression [ joinCondition ]
  |   tableExpression CROSS JOIN tableExpression

joinCondition:
      ON booleanExpression
  |   USING '(' column [, column ]* ')'

tableReference:
      tablePrimary
      [ matchRecognize ]
      [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]

tablePrimary:
      [ [ catalogName . ] schemaName . ] tableName
      '(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'
  |   [ LATERAL ] '(' query ')'
  |   UNNEST '(' expression ')' [ WITH ORDINALITY ]
  |   [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')'

columnDecl:
      column type [ NOT NULL ]

optionValue:
      stringLiteral
  |   numericLiteral

values:
      VALUES expression [, expression ]*

groupItem:
      expression
  |   '(' ')'
  |   '(' expression [, expression ]* ')'
  |   CUBE '(' expression [, expression ]* ')'
  |   ROLLUP '(' expression [, expression ]* ')'
  |   GROUPING SETS '(' groupItem [, groupItem ]* ')'

window:
      windowName
  |   windowSpec

windowSpec:
      '('
      [ windowName ]
      [ ORDER BY orderItem [, orderItem ]* ]
      [ PARTITION BY expression [, expression ]* ]
      [
          RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
      |   ROWS numericExpression { PRECEDING | FOLLOWING }
      ]
      ')'

In insert, if the INSERT or UPSERT statement does not specify a list of target columns, the query must have the same number of columns as the target table and the order of the fields in the VALUES clause must be the same as the order of the fields in the table.

In orderItem, if expression is a positive integer n, it denotes the nth item in the SELECT clause. As an example:

SELECT F1, F2, F3, F4 FROM T ORDER BY 1,2

This will order by F1 and then F2 which are the first and second in the SELECT clause.

In query, count and start may each be either an unsigned integer literal or a dynamic parameter whose value is an integer.

An aggregate query is a query that contains a GROUP BY or a HAVING clause, or aggregate functions in the SELECT clause. In the SELECT, HAVING and ORDER BY clauses of an aggregate query, all expressions must be constant within the current group (that is, grouping constants as defined by the GROUP BY clause, or constants), or aggregate functions, or a combination of constants and aggregate functions. Aggregate and grouping functions may only appear in an aggregate query, and only in a SELECT, HAVING or ORDER BY clause.

A scalar sub-query is a sub-query used as an expression. If the sub-query returns no rows, the value is NULL; if it returns more than one row, it is an error.

IN, EXISTS and scalar sub-queries can occur in any place where an expression can occur (such as the SELECT clause, WHERE clause, ON clause of a JOIN, or as an argument to an aggregate function).

An IN, EXISTS or scalar sub-query may be correlated; that is, it may refer to tables in the FROM clause of an enclosing query.

selectWithoutFrom is equivalent to VALUES, but is not standard SQL. An example follows:

SELECT 1, CURRENT_TIMESTAMP;

This just yields the values 1 and the current timestamp in the database server.

MINUS is not supported. EXCEPT is and is completely equivalent to MINUS.

“LIMIT start, count” is not supported, but you can use the equivalent “LIMIT count OFFSET start”

Keywords

The following is a list of SQL keywords. Reserved keywords are bold.

A, ABS, ABSENT, ABSOLUTE, ACTION, ADA, ADD, ADMIN, AFTER, ALL, ALLOCATE, ALLOW, ALTER, ALWAYS, AND, ANY, APPLY, ARE, ARRAY, ARRAY_MAX_CARDINALITY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTRIBUTE, ATTRIBUTES, AUTHORIZATION, AVG, BEFORE, BEGIN, BEGIN_FRAME, BEGIN_PARTITION, BERNOULLI, BETWEEN, BIGINT, BINARY, BIT, BLOB, BOOLEAN, BOTH, BREADTH, BY, C, CALL, CALLED, CARDINALITY, CASCADE, CASCADED, CASE, CAST, CATALOG, CATALOG_NAME, CEIL, CEILING, CENTURY, CHAIN, CHAR, CHARACTER, CHARACTERISTICS, CHARACTERS, CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME, CHARACTER_SET_SCHEMA, CHAR_LENGTH, CHECK, CLASSIFIER, CLASS_ORIGIN, CLOB, CLOSE, COALESCE, COBOL, COLLATE, COLLATION, COLLATION_CATALOG, COLLATION_NAME, COLLATION_SCHEMA, COLLECT, COLUMN, COLUMN_NAME, COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMIT, COMMITTED, CONDITION, CONDITIONAL, CONDITION_NUMBER, CONNECT, CONNECTION, CONNECTION_NAME, CONSTRAINT, CONSTRAINTS, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, CONSTRUCTOR, CONTAINS, CONTINUE, CONVERT, CORR, CORRESPONDING, COUNT, COVAR_POP, COVAR_SAMP, CREATE, CROSS, CUBE, CUME_DIST, CURRENT, CURRENT_CATALOG, CURRENT_DATE, CURRENT_DEFAULT_TRANSFORM_GROUP, CURRENT_PATH, CURRENT_ROLE, CURRENT_ROW, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_TRANSFORM_GROUP_FOR_TYPE, CURRENT_USER, CURSOR, CURSOR_NAME, CYCLE, DATA, DATABASE, DATE, DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DAY, DAYS, DEALLOCATE, DEC, DECADE, DECIMAL, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINE, DEFINED, DEFINER, DEGREE, DELETE, DENSE_RANK, DEPTH, DEREF, DERIVED, DESC, DESCRIBE, DESCRIPTION, DESCRIPTOR, DETERMINISTIC, DIAGNOSTICS, DISALLOW, DISCONNECT, DISPATCH, DISTINCT, DOMAIN, DOUBLE, DOW, DOY, DROP, DYNAMIC, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE, EACH, ELEMENT, ELSE, EMPTY, ENCODING, END, END-EXEC, END_FRAME, END_PARTITION, EPOCH, EQUALS, ERROR, ESCAPE, EVERY, EXCEPT, EXCEPTION, EXCLUDE, EXCLUDING, EXEC, EXECUTE, EXISTS, EXP, EXPLAIN, EXTEND, EXTERNAL, EXTRACT, FALSE, FETCH, FILTER, FINAL, FIRST, FIRST_VALUE, FLOAT, FLOOR, FOLLOWING, FOR, FOREIGN, FORMAT, FORTRAN, FOUND, FRAC_SECOND, FRAME_ROW, FREE, FROM, FULL, FUNCTION, FUSION, G, GENERAL, GENERATED, GEOMETRY, GET, GLOBAL, GO, GOTO, GRANT, GRANTED, GROUP, GROUPING, GROUPS, HAVING, HIERARCHY, HOLD, HOUR, HOURS, IDENTITY, IGNORE, IMMEDIATE, IMMEDIATELY, IMPLEMENTATION, IMPORT, IN, INCLUDING, INCREMENT, INDICATOR, INITIAL, INITIALLY, INNER, INOUT, INPUT, INSENSITIVE, INSERT, INSTANCE, INSTANTIABLE, INT, INTEGER, INTERSECT, INTERSECTION, INTERVAL, INTO, INVOKER, IS, ISODOW, ISOLATION, ISOYEAR, JAVA, JOIN, JSON, JSON_ARRAY, JSON_ARRAYAGG, JSON_EXISTS, JSON_OBJECT, JSON_OBJECTAGG, JSON_QUERY, JSON_VALUE, K, KEY, KEY_MEMBER, KEY_TYPE, LABEL, LAG, LANGUAGE, LARGE, LAST, LAST_VALUE, LATERAL, LEAD, LEADING, LEFT, LENGTH, LEVEL, LIBRARY, LIKE, LIKE_REGEX, LIMIT, LN, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, LOWER, M, MAP, MATCH, MATCHED, MATCHES, MATCH_NUMBER, MATCH_RECOGNIZE, MAX, MAXVALUE, MEASURES, MEMBER, MERGE, MESSAGE_LENGTH, MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MICROSECOND, MILLENNIUM, MILLISECOND, MIN, MINUS, MINUTE, MINUTES, MINVALUE, MOD, MODIFIES, MODULE, MONTH, MONTHS, MORE, MULTISET, MUMPS, NAME, NAMES, NANOSECOND, NATIONAL, NATURAL, NCHAR, NCLOB, NESTING, NEW, NEXT, NO, NONE, NORMALIZE, NORMALIZED, NOT, NTH_VALUE, NTILE, NULL, NULLABLE, NULLIF, NULLS, NUMBER, NUMERIC, OBJECT, OCCURRENCES_REGEX, OCTETS, OCTET_LENGTH, OF, OFFSET, OLD, OMIT, ON, ONE, ONLY, OPEN, OPTION, OPTIONS, OR, ORDER, ORDERING, ORDINALITY, OTHERS, OUT, OUTER, OUTPUT, OVER, OVERLAPS, OVERLAY, OVERRIDING, PAD, PARAMETER, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION, PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME, PARAMETER_SPECIFIC_SCHEMA, PARTIAL, PARTITION, PASCAL, PASSING, PASSTHROUGH, PAST, PATH, PATTERN, PER, PERCENT, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, PERIOD, PERMUTE, PLACING, PLAN, PLI, PORTION, POSITION, POSITION_REGEX, POWER, PRECEDES, PRECEDING, PRECISION, PREPARE, PRESERVE, PREV, PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, QUARTER, RANGE, RANK, READ, READS, REAL, RECURSIVE, REF, REFERENCES, REFERENCING, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY, RELATIVE, RELEASE, REPEATABLE, REPLACE, RESET, RESPECT, RESTART, RESTRICT, RESULT, RETURN, RETURNED_CARDINALITY, RETURNED_LENGTH, RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, RETURNING, RETURNS, REVOKE, RIGHT, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINE_CATALOG, ROUTINE_NAME, ROUTINE_SCHEMA, ROW, ROWS, ROW_COUNT, ROW_NUMBER, RUNNING, SAVEPOINT, SCALAR, SCALE, SCHEMA, SCHEMA_NAME, SCOPE, SCOPE_CATALOGS, SCOPE_NAME, SCOPE_SCHEMA, SCROLL, SEARCH, SECOND, SECONDS, SECTION, SECURITY, SEEK, SELECT, SELECT_FOR_UPDATE, SELF, SENSITIVE, SEQUENCE, SERIALIZABLE, SERVER, SERVER_NAME, SESSION, SESSION_USER, SET, SETS, SHOW, SIMILAR, SIMPLE, SIZE, SKIP, SMALLINT, SOME, SOURCE, SPACE, SPECIFIC, SPECIFICTYPE, SPECIFIC_NAME, SQL, SQLEXCEPTION, SQLSTATE, SQLWARNING, SQL_BIGINT, SQL_BINARY, SQL_BIT, SQL_BLOB, SQL_BOOLEAN, SQL_CHAR, SQL_CLOB, SQL_DATE, SQL_DECIMAL, SQL_DOUBLE, SQL_FLOAT, SQL_INTEGER, SQL_INTERVAL_DAY, SQL_INTERVAL_DAY_TO_HOUR, SQL_INTERVAL_DAY_TO_MINUTE, SQL_INTERVAL_DAY_TO_SECOND, SQL_INTERVAL_HOUR, SQL_INTERVAL_HOUR_TO_MINUTE, SQL_INTERVAL_HOUR_TO_SECOND, SQL_INTERVAL_MINUTE, SQL_INTERVAL_MINUTE_TO_SECOND, SQL_INTERVAL_MONTH, SQL_INTERVAL_SECOND, SQL_INTERVAL_YEAR, SQL_INTERVAL_YEAR_TO_MONTH, SQL_LONGVARBINARY, SQL_LONGVARCHAR, SQL_LONGVARNCHAR, SQL_NCHAR, SQL_NCLOB, SQL_NUMERIC, SQL_NVARCHAR, SQL_REAL, SQL_SMALLINT, SQL_TIME, SQL_TIMESTAMP, SQL_TINYINT, SQL_TSI_DAY, SQL_TSI_FRAC_SECOND, SQL_TSI_HOUR, SQL_TSI_MICROSECOND, SQL_TSI_MINUTE, SQL_TSI_MONTH, SQL_TSI_QUARTER, SQL_TSI_SECOND, SQL_TSI_WEEK, SQL_TSI_YEAR, SQL_VARBINARY, SQL_VARCHAR, SQRT, START, STATE, STATEMENT, STATIC, STDDEV_POP, STDDEV_SAMP, STREAM, STRUCTURE, STYLE, SUBCLASS_ORIGIN, SUBMULTISET, SUBSET, SUBSTITUTE, SUBSTRING, SUBSTRING_REGEX, SUCCEEDS, SUM, SYMMETRIC, SYSTEM, SYSTEM_TIME, SYSTEM_USER, TABLE, TABLESAMPLE, TABLE_NAME, TEMPORARY, THEN, TIES, TIME, TIMESTAMP, TIMESTAMPADD, TIMESTAMPDIFF, TIMEZONE_HOUR, TIMEZONE_MINUTE, TINYINT, TO, TOP_LEVEL_COUNT, TRAILING, TRANSACTION, TRANSACTIONS_ACTIVE, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK, TRANSFORM, TRANSFORMS, TRANSLATE, TRANSLATE_REGEX, TRANSLATION, TREAT, TRIGGER, TRIGGER_CATALOG, TRIGGER_NAME, TRIGGER_SCHEMA, TRIM, TRIM_ARRAY, TRUE, TRUNCATE, TUMBLE, TYPE, UESCAPE, UNBOUNDED, UNCOMMITTED, UNCONDITIONAL, UNDER, UNIFORM, UNION, UNIQUE, UNKNOWN, UNNAMED, UNNEST, UPDATE, UPPER, UPSERT, USAGE, USER, USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_CODE, USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA, USING, UTF16, UTF32, UTF8, VALUE, VALUES, VALUE_OF, VARBINARY, VARCHAR, VARYING, VAR_POP, VAR_SAMP, VERSION, VERSIONING, VIEW, WEEK, WHEN, WHENEVER, WHERE, WIDTH_BUCKET, WINDOW, WITH, WITHIN, WITHOUT, WORK, WRAPPER, WRITE, XML, YEAR, YEARS, ZONE.

If you want to use any of these keywords as part of your column names or whatever, you need to double quote the identifier: "". As an example:

CREATE TABLE T1 (k1 BIGINT PRIMARY KEY, "values" BIGINT)

Identifiers

Identifiers are the names of tables, columns and other metadata elements used in a SQL query.

Unquoted identifiers, such as emp, must start with a letter and can only contain letters, digits, and underscores. They are implicitly converted to upper case.

Quoted identifiers, such as "Employee Name", start and end with double quotes. They may contain virtually any character, including spaces and other punctuation. If you wish to include a double quote in an identifier, use another double quote to escape it, like this: "An employee called ""Fred"".".

Matching identifiers to the name of the referenced object is case-sensitive. But remember that unquoted identifiers are implicitly converted to upper case before matching, and if the object it refers to was created using an unquoted identifier for its name, then its name will have been converted to upper case also.

Data-types

Scalar-types

Data type Description Range and example literals

BOOLEAN

Logical values

Values: TRUE, FALSE, UNKNOWN

TINYINT

1 byte signed integer

Range is -128 to 127

SMALLINT

2 byte signed integer

Range is -32768 to 32767

INTEGER, INT

4 byte signed integer

Range is -2147483648 to 2147483647

BIGINT

8 byte signed integer

Range is -9223372036854775808 to 9223372036854775807

DECIMAL(p, s)

Fixed point

Example: 123.45 is a DECIMAL(5, 2) value.

NUMERIC

Fixed point

REAL, FLOAT

4 byte floating point

6 decimal digits precision

DOUBLE

8 byte floating point

15 decimal digits precision

CHAR(n), CHARACTER(n)

Fixed-width character string

‘Hello’, ‘’ (empty string), _latin1’Hello’, n’Hello’, _UTF16’Hello’, ‘Hello’ ‘there’ (literal split into multiple parts)

VARCHAR(n), CHARACTER VARYING(n)

Variable-length character string

As CHAR(n)

BINARY(n)

Fixed-width binary string

x’45F0AB’, x’’ (empty binary string), x’AB’ ‘CD’ (multi-part binary string literal)

VARBINARY(n), BINARY VARYING(n)

Variable-length binary string

As BINARY(n)

DATE

Date

Example: DATE ‘1969-07-20’

TIME

Time of day

Example: TIME ‘20:17:40’

TIMESTAMP

Date and time

Example: TIMESTAMP ‘1969-07-20 20:17:40’

INTERVAL timeUnit [ TO timeUnit ]

Date time interval

Examples: INTERVAL ‘1-5’ YEAR TO MONTH, INTERVAL ‘45’ DAY, INTERVAL ‘1 2:34:56.789’ DAY TO SECOND

You can configure the database to use the TIMESTAMP in either one of two behaviors - The default behavior is TIMESTAMP have no implicit timezone. And are seen as written regardless of where the client is located. This means if a client in GMT+6 inserts.

+

INSERT INTO T VALUES(TIMESTAMP '1969-07-20 20:17:40')

+ Another client, located in another timezone (let’s say GMT-6) will see:

+

SELECT TS FROM T
1969-07-20 20:17:40

+ Timestamp was written as whatever and is seen as whatever no conversion is done. It’s up to the end-user to decide if the TIMESTAMP is representing UTC or whatever.

  • You may configure the TIMESTAMP to behave always as TIMESTAMP WITH LOCAL TIME ZONE. This means that the TIMESTAMP you INSERT will be converted to UTC using your local settings and the reader will see the TIMESTAMP converted to its local settings. In the case above:

Writer(GMT+6)
INSERT INTO T VALUES(TIMESTAMP ‘1969-07-20 20:17:40’)

Database: UTC: 1969-07-20 14:17:40

Reader(GMT-6)
SELECT TS FROM T
1969-07-20 08:17:40

Where:

timeUnit:
  MILLENNIUM | CENTURY | DECADE | YEAR | QUARTER | MONTH | WEEK | DOY | DOW | DAY | HOUR | MINUTE | SECOND | EPOCH

Note:

  • As stated above, by default, DATE, TIME and TIMESTAMP have no time zone. For those types, there is not even an implicit time zone, such as UTC (as in Java) or the local time zone. It is left to the user or application to supply a time zone. In turn, TIMESTAMP WITH LOCAL TIME ZONE does not store the time zone internally, times will be stored UTC and it will rely on the supplied time zone to provide correct semantics.

  • Interval literals may only use time units YEAR, MONTH, DAY, HOUR, MINUTE and SECOND.

Convert Timestamps between Timezones

SELECT CAST(TZ_TIMESTAMP('1969-07-20 20:17:40 America/Montreal') AS TIMESTAMP);

Non-scalar-types

Type Description Example literals

MAP

Collection of keys mapped to values

MULTISET

Unordered collection that may contain duplicates

Example: int multiset

ARRAY

Ordered, contiguous collection that may contain duplicates

Example: varchar(10) array

CURSOR

Cursor over the result of executing a query

Note:

  • Every ROW column type can have an optional [ NULL | NOT NULL ] suffix to indicate if this column type is nullable, default is not nullable.

MAP format is mapped to VARBINARY type internally and can be used as part of the table to support key value maps. MAP type cannot be used as part of the primary key. An example follows:

CREATE TABLE TMAP (c1 BIGINT, mapfield MAP, PRIMARY KEY (c1));

UPSERT INTO TMAP VALUES(1, MAP['key1', 121]);

SELECT mapfield['key1'] FROM TMAP;

The result of the SELECT above is 121.

Operators and functions

Operator precedence

The operator precedence and associativity, highest to lowest.

Operator Associativity

.

left

::

left

[ ] (array element)

left

+ - (unary plus, minus)

right

* / %

left

+ -

left

BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS etc.

-

< > = ⇐ >= <> !=

left

IS NULL, IS FALSE, IS NOT TRUE etc.

-

NOT

right

AND

left

OR

left

Note that :: is dialect-specific, but is shown in this table for completeness.

Comparison operators

Operator syntax Description

value1 = value2

Equals

value1 <> value2

Not equal

value1 > value2

Greater than

value1 >= value2

Greater than or equal

value1 < value2

Less than

value1 ⇐ value2

Less than or equal

value IS NULL

Whether value is null

value IS NOT NULL

Whether value is not null

value1 IS DISTINCT FROM value2

Whether two values are not equal, treating null values as the same

value1 IS NOT DISTINCT FROM value2

Whether two values are equal, treating null values as the same

value1 BETWEEN value2 AND value3

Whether value1 is greater than or equal to value2 and less than or equal to value3

value1 NOT BETWEEN value2 AND value3

Whether value1 is less than value2 or greater than value3

string1 LIKE string2 [ ESCAPE string3 ]

Whether string1 matches pattern string2

string1 NOT LIKE string2 [ ESCAPE string3 ]

Whether string1 does not match pattern string2

string1 SIMILAR TO string2 [ ESCAPE string3 ]

Whether string1 matches regular expression string2

string1 NOT SIMILAR TO string2 [ ESCAPE string3 ]

Whether string1 does not match regular expression string2

value IN (value [, value]*)

Whether value is equal to a value in a list

value NOT IN (value [, value]*)

Whether value is not equal to every value in a list

value IN (sub-query)

Whether value is equal to a row returned by sub-query

value NOT IN (sub-query)

Whether value is not equal to every row returned by sub-query

value comparison SOME (sub-query)

Whether value comparison at least one row returned by sub-query

value comparison ANY (sub-query)

Synonym for SOME

value comparison ALL (sub-query)

Whether value comparison every row returned by sub-query

EXISTS (sub-query)

Whether sub-query returns at least one row

comp:
      =
  |   <>
  |   >
  |   >=
  |   <
  |   <=

Regular expressions

LIKE and SIMILAR TO operators allow regular expressions. You can use SQL traditional expressions using special characters ('%','_') and you can also use regular expressions with a syntax similar to perl-like regular expressions. Specifically, regular expressions may contain.

  • A rune, standing for itself.

  • '.', standing for any rune.

  • '^', start of text.

  • '$', end of text.

  • '[a-z]', set of runes. Multiple ranges and single runes ok. '[^a-z]', set or excluded runes. '\a', alpha runes (also ok within sets). Includes ''. '\u', upper runes (also ok within sets). Includes ''. '\l', lower runes (also ok within sets). Includes ''. '\b', blank runes (also ok within sets). '\w', word runes (also ok within sets). Includes ''.

They may also contain the following operators (from low to high precedence):

  • '(re)', for grouping

  • 're|re', alternative

  • concatenation (implicit, no operator)

  • 're*', zero to any number of times the left regexp.

  • 're+', one or any further number of times the left regexp.

  • 're?', zero or one time the left regexp.

The main difference between LIKE and SIMILAR TO is that LIKE implies the expression will start with the text in the expression while SIMILAR doesn’t imply '^'.

Logical operators

Operator syntax Description

boolean1 OR boolean2

Whether boolean1 is TRUE or boolean2 is TRUE

boolean1 AND boolean2

Whether boolean1 and boolean2 are both TRUE

NOT boolean

Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN

boolean IS FALSE

Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN

boolean IS NOT FALSE

Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN

boolean IS TRUE

Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN

boolean IS NOT TRUE

Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN

boolean IS UNKNOWN

Whether boolean is UNKNOWN

boolean IS NOT UNKNOWN

Whether boolean is not UNKNOWN

Arithmetic operators and functions

Operator syntax Description

+ numeric

Returns numeric

- numeric

Returns negative numeric

numeric1 + numeric2

Returns numeric1 plus numeric2

numeric1 - numeric2

Returns numeric1 minus numeric2

numeric1 * numeric2

Returns numeric1 multiplied by numeric2

numeric1 / numeric2

Returns numeric1 divided by numeric2

POWER(numeric1, numeric2)

Returns numeric1 raised to the power of numeric2

ABS(numeric)

Returns the absolute value of numeric

MOD(numeric1, numeric2)

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

SQRT(numeric)

Returns the square root of numeric

LN(numeric)

Returns the natural logarithm (base e) of numeric

LOG10(numeric)

Returns the base 10 logarithm of numeric

EXP(numeric)

Returns e raised to the power of numeric

CEIL(numeric)

Rounds numeric up, returning the smallest integer that is greater than or equal to numeric

FLOOR(numeric)

Rounds numeric down, returning the largest integer that is less than or equal to numeric

RAND([seed])

Generates a random double between 0 and 1 inclusive, optionally initializing the random number generator with seed

RAND_INTEGER([seed, ] numeric)

Generates a random integer between 0 and numeric - 1 inclusive, optionally initializing the random number generator with seed

ACOS(numeric)

Returns the arc cosine of numeric

ASIN(numeric)

Returns the arc sine of numeric

ATAN(numeric)

Returns the arc tangent of numeric

ATAN2(numeric, numeric)

Returns the arc tangent of the numeric coordinates

CBRT(numeric)

Returns the cube root of numeric

COS(numeric)

Returns the cosine of numeric

COT(numeric)

Returns the cotangent of numeric

DEGREES(numeric)

Converts numeric from radians to degrees

PI()

Returns a value that is closer than any other value to pi

RADIANS(numeric)

Converts numeric from degrees to radians

ROUND(numeric1 [, numeric2])

Rounds numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

SIGN(numeric)

Returns the signum of numeric

SIN(numeric)

Returns the sine of numeric

TAN(numeric)

Returns the tangent of numeric

TRUNCATE(numeric1 [, numeric2])

Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point

Character string operators and functions

Operator syntax Description

string

string

Concatenates two character strings

CHAR_LENGTH(string)

Returns the number of characters in a character string

CHARACTER_LENGTH(string)

As CHAR_LENGTH(string)

UPPER(string)

Returns a character string converted to upper case

LOWER(string)

Returns a character string converted to lower case

POSITION(string1 IN string2)

Returns the position of the first occurrence of string1 in string2

POSITION(string1 IN string2 FROM integer)

Returns the position of the first occurrence of string1 in string2 starting at a given point (not standard SQL)

TRIM( \{ BOTH

LEADING

TRAILING } string1 FROM string2)

Removes the longest string containing only the characters in string1 from the start/end/both ends of string1

OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])

Replaces a substring of string1 with string2

SUBSTRING(string FROM integer)

Returns a substring of a character string starting at a given point

SUBSTRING(string FROM integer FOR integer)

Returns a substring of a character string starting at a given point with a given length

INITCAP(string)

Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

Not implemented:

  • SUBSTRING(string FROM regexp FOR regexp)

  • Above you may miss functions like LPAD and RPAD. The behaviour of those functions can be achieved through OVERLAY though LPAD and RPAD are more convenient. For example, the following query will pad at the end:

SELECT OVERLAY('MYTEXT' PLACING 'PAD' FROM CHAR_LENGTH('MYTEXT')+1 FOR 0)

Binary string operators and functions

Operator syntax Description

binary

binary

Concatenates two binary strings

POSITION(binary1 IN binary2)

Returns the position of the first occurrence of binary1 in binary2

POSITION(binary1 IN binary2 FROM integer)

Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL)

OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ])

Replaces a substring of binary1 with binary2

SUBSTRING(binary FROM integer)

Returns a substring of binary starting at a given point

SUBSTRING(binary FROM integer FOR integer)

Returns a substring of binary starting at a given point with a given length

Date/time functions

Operator syntax Description

LOCALTIME

Returns the current date and time in the session time zone in a value of datatype TIME

LOCALTIME(precision)

Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision

LOCALTIMESTAMP

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP

LOCALTIMESTAMP(precision)

Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with precision digits of precision

CURRENT_TIME

Returns the current time in the session time zone, in a value of datatype TIMESTAMP

CURRENT_DATE

Returns the current date in the session time zone, in a value of datatype DATE

CURRENT_TIMESTAMP

Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP

EXTRACT(timeUnit FROM datetime)

Extracts and returns the value of a specified datetime field from a datetime value expression

FLOOR(datetime TO timeUnit)

Rounds datetime down to timeUnit

CEIL(datetime TO timeUnit)

Rounds datetime up to timeUnit

YEAR(date)

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

QUARTER(date)

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

MONTH(date)

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

WEEK(date)

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

DAYOFYEAR(date)

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

DAYOFMONTH(date)

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

DAYOFWEEK(date)

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

HOUR(date)

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

MINUTE(date)

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

SECOND(date)

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

TIMESTAMPADD(timeUnit, integer, datetime)

Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to datetime + INTERVAL 'integer' timeUnit

TIMESTAMPDIFF(timeUnit, datetime, datetime2)

Returns the (signed) number of timeUnit intervals between datetime and datetime2.

LAST_DAY(date)

Returns the date of the last day of the month in a value of datatype DATE; For example, it returns DATE’2020-02-29’ for both DATE’2020-02-10’ and TIMESTAMP’2020-02-10 10:10:10’

SELECT * FROM TABLEDELTA WHERE timest > LOCALTIMESTAMP - INTERVAL '1' DAY

SELECT (LOCALTIMESTAMP - timest) HOUR FROM TABLEDELTA WHERE timest > LOCALTIMESTAMP - INTERVAL '4' DAY AND keyid = '10';

Calls to niladic functions such as CURRENT_DATE do not accept parentheses (nor in LeanXcale nor in standard SQL)

Not implemented:

  • CEIL(interval)

  • FLOOR(interval)

  • + interval

  • - interval

  • interval + interval

  • interval - interval

  • interval / interval

Conditional functions and operators

Operator syntax Description

CASE value
WHEN value1 [, value11 ]* THEN result1
[ WHEN valueN [, valueN1 ]* THEN resultN ]*
[ ELSE resultZ ]
END

Simple case

CASE
WHEN condition1 THEN result1
[ WHEN conditionN THEN resultN ]*
[ ELSE resultZ ]
END

Searched case

NULLIF(value, value)

Returns NULL if the values are the same.

For example, NULLIF(5, 5) returns NULL; NULLIF(5, 0) returns 5.

COALESCE(value, value [, value ]*)

Provides a value if the first value is null.

For example, COALESCE(NULL, 5) returns 5.

Type conversion

Generally an expression cannot contain values of different datatypes. For example, an expression cannot multiply 5 by 10 and then add ‘JULIAN’. However, supports both implicit and explicit conversion of values from one datatype to another.

Implicit and Explicit Type Conversion

Calcite recommends that you specify explicit conversions, rather than rely on implicit or automatic conversions, for these reasons:

  • SQL statements are easier to understand when you use explicit datatype conversion functions.

  • Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.

  • Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR value may return an unexpected format.

Algorithms for implicit conversion are subject to change across Calcite releases. Behavior of explicit conversions is more predictable.

Explicit type conversion

Operator syntax Description

CAST(value AS type)

Converts a value to a given type.

Supported data types syntax:

type:
      typeName
      [ collectionsTypeName ]*

typeName:
      sqlTypeName
  |   rowTypeName
  |   compoundIdentifier

sqlTypeName:
      char [ precision ] [ charSet ]
  |   varchar [ precision ] [ charSet ]
  |   DATE
  |   time
  |   timestamp
  |   GEOMETRY
  |   decimal [ precision [, scale] ]
  |   BOOLEAN
  |   integer
  |   BINARY [ precision ]
  |   varbinary [ precision ]
  |   TINYINT
  |   SMALLINT
  |   BIGINT
  |   REAL
  |   double
  |   FLOAT
  |   ANY [ precision [, scale] ]

collectionsTypeName:
      ARRAY | MULTISET

rowTypeName:
      ROW '('
      fieldName1 fieldType1 [ NULL | NOT NULL ]
      [ , fieldName2 fieldType2 [ NULL | NOT NULL ] ]*
      ')'

char:
      CHARACTER | CHAR

varchar:
      char VARYING | VARCHAR

decimal:
      DECIMAL | DEC | NUMERIC

integer:
      INTEGER | INT

varbinary:
      BINARY VARYING | VARBINARY

double:
      DOUBLE [ PRECISION ]

time:
      TIME [ precision ] [ timeZone ]

timestamp:
      TIMESTAMP [ precision ] [ timeZone ]

charSet:
      CHARACTER SET charSetName

timeZone:
      WITHOUT TIME ZONE
  |   WITH LOCAL TIME ZONE

Implicit type conversion

Calcite automatically converts a value from one datatype to another when such a conversion makes sense. The table below is a matrix of Calcite type conversions. The table shows all possible conversions, without regard to the context in which it is made. The rules governing these details follow the table.

FROM - TO NULL BOOLEAN TINYINT SMALLINT INT BIGINT DECIMAL FLOAT or REAL DOUBLE INTERVAL DATE TIME TIMESTAMP CHAR or VARCHAR BINARY or VARBINARY

NULL

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

BOOLEAN

x

i

e

e

e

e

e

e

e

x

x

x

x

i

x

TINYINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

SMALLINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

INT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

BIGINT

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

DECIMAL

x

e

i

i

i

i

i

i

i

e

x

x

e

i

x

FLOAT/REAL

x

e

i

i

i

i

i

i

i

x

x

x

e

i

x

DOUBLE

x

e

i

i

i

i

i

i

i

x

x

x

e

i

x

INTERVAL

x

x

e

e

e

e

e

x

x

i

x

x

x

e

x

DATE

x

x

x

x

x

x

x

x

x

x

i

x

i

i

x

TIME

x

x

x

x

x

x

x

x

x

x

x

i

e

i

x

TIMESTAMP

x

x

e

e

e

e

e

e

e

x

i

e

i

i

x

CHAR or VARCHAR

x

e

i

i

i

i

i

i

i

i

i

i

i

i

i

BINARY or VARBINARY

x

x

x

x

x

x

x

x

x

x

e

e

e

i

i

i: implicit cast / e: explicit cast / x: not allowed

Conversion Contexts and Strategies
  • Set operation (UNION, EXCEPT, INTERSECT): compare every branch row data type and find the common type of each fields pair;

  • Binary arithmetic expression (+, -, &, ^, /, %): promote string operand to data type of the other numeric operand;

  • Binary comparison (=, <, , <>, >, >=): if operands are STRING and TIMESTAMP, promote to TIMESTAMP; make 1 = true and 0 = false always evaluate to TRUE; if there is numeric type operand, find common type for both operands.

  • IN sub-query: compare type of LHS and RHS, and find the common type; if it is struct type, find wider type for every field;

  • IN expression list: compare every expression to find the common type;

  • CASE WHEN expression or COALESCE: find the common wider type of the THEN and ELSE operands;

  • Character + INTERVAL or character - INTERVAL: promote character to TIMESTAMP;

  • Built-in function: look up the type families registered in the checker, find the family default type if checker rules allow it;

  • User-defined function (UDF): coerce based on the declared argument types of the eval() method;

  • INSERT and UPDATE: coerce a source field to counterpart target table field’s type if the two fields differ with type name or precision(scale).

Note:

Implicit type coercion of following cases are ignored:

  • One of the type is ANY;

  • Type coercion within CHARACTER types are always ignored, i.e. from CHAR(20) to VARCHAR(30);

  • Type coercion from a numeric to another with higher precedence is ignored, i.e. from INT to LONG.

Strategies for Finding Common Type
  • If the operator has expected data types, just take them as the desired one. (e.g. the UDF would have eval() method which has reflection argument types);

  • If there is no expected data type but the data type families are registered, try to coerce the arguments to the family’s default data type, i.e. the String family will have a VARCHAR type;

  • If neither expected data type nor families are specified, try to find the tightest common type of the node types, i.e. INTEGER and DOUBLE will return DOUBLE, the numeric precision does not lose for this case;

  • If no tightest common type is found, try to find a wider type, i.e. VARCHAR and INTEGER will return INTEGER, we allow some precision loss when widening decimal to fractional, or promote to VARCHAR type.

Value constructors

Operator syntax Description

ROW (value [, value ]*)

Creates a row from a list of values.

(value [, value ]* )

Creates a row from a list of values.

map ‘[’ key ‘]’

Returns the element of a map with a particular key.

array ‘[’ index ‘]’

Returns the element at a particular location in an array.

ARRAY ‘[’ value [, value ]* ‘]’

Creates an array from a list of values.

MAP ‘[’ key, value [, key, value ]* ‘]’

Creates a map from a list of key-value pairs.

Collection functions

Operator syntax Description

ELEMENT(value)

Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element.

CARDINALITY(value)

Returns the number of elements in an array or multiset.

value MEMBER OF multiset

Returns whether the value is a member of multiset.

multiset IS A SET

Whether multiset is a set (has no duplicates).

multiset IS NOT A SET

Whether multiset is not a set (has duplicates).

multiset IS EMPTY

Whether multiset contains zero elements.

multiset IS NOT EMPTY

Whether multiset contains one or more elements.

multiset SUBMULTISET OF multiset2

Whether multiset is a submultiset of multiset2.

multiset NOT SUBMULTISET OF multiset2

Whether multiset is not a submultiset of multiset2.

multiset MULTISET UNION [ ALL

DISTINCT ] multiset2

Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET INTERSECT [ ALL

DISTINCT ] multiset2

Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default).

multiset MULTISET EXCEPT [ ALL

DISTINCT ] multiset2

See also: the UNNEST relational operator converts a collection to a relation.

Period predicates

Operator syntax Description

period1 CONTAINS datetime

period1 CONTAINS period2

period1 OVERLAPS period2

period1 EQUALS period2

period1 PRECEDES period2

period1 IMMEDIATELY PRECEDES period2

period1 SUCCEEDS period2

period1 IMMEDIATELY SUCCEEDS period2

Where period1 and period2 are period expressions:

period:
      (datetime, datetime)
  |   (datetime, interval)
  |   PERIOD (datetime, datetime)
  |   PERIOD (datetime, interval)

JDBC function escape

Numeric

Operator syntax Description

\{fn ABS(numeric)}

Returns the absolute value of numeric

\{fn ACOS(numeric)}

Returns the arc cosine of numeric

\{fn ASIN(numeric)}

Returns the arc sine of numeric

\{fn ATAN(numeric)}

Returns the arc tangent of numeric

\{fn ATAN2(numeric, numeric)}

Returns the arc tangent of the numeric coordinates

\{fn CBRT(numeric)}

Returns the cube root of numeric

\{fn CEILING(numeric)}

Rounds numeric up, and returns the smallest number that is greater than or equal to numeric

\{fn COS(numeric)}

Returns the cosine of numeric

\{fn COT(numeric)}

Returns the cotangent of numeric

\{fn DEGREES(numeric)}

Converts numeric from radians to degrees

\{fn EXP(numeric)}

Returns e raised to the power of numeric

\{fn FLOOR(numeric)}

Rounds numeric down, and returns the largest number that is less than or equal to numeric

\{fn LOG(numeric)}

Returns the natural logarithm (base e) of numeric

\{fn LOG10(numeric)}

Returns the base-10 logarithm of numeric

\{fn MOD(numeric1, numeric2)}

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative

\{fn PI()}

Returns a value that is closer than any other value to pi

\{fn POWER(numeric1, numeric2)}

Returns numeric1 raised to the power of numeric2

\{fn RADIANS(numeric)}

Converts numeric from degrees to radians

\{fn RAND(numeric)}

Returns a random double using numeric as the seed value

\{fn ROUND(numeric1, numeric2)}

Rounds numeric1 to numeric2 places right to the decimal point

\{fn SIGN(numeric)}

Returns the signum of numeric

\{fn SIN(numeric)}

Returns the sine of numeric

\{fn SQRT(numeric)}

Returns the square root of numeric

\{fn TAN(numeric)}

Returns the tangent of numeric

\{fn TRUNCATE(numeric1, numeric2)}

Truncates numeric1 to numeric2 places right to the decimal point

String

Operator syntax Description

\{fn ASCII(string)}

Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty

\{fn CONCAT(character, character)}

Returns the concatenation of character strings

\{fn INSERT(string1, start, length, string2)}

Inserts string2 into a slot in string1

\{fn LCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to lower case

\{fn LENGTH(string)}

Returns the number of characters in a string

\{fn LOCATE(string1, string2 [, integer])}

Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified.

\{fn LEFT(string, length)}

Returns the leftmost length characters from string

\{fn LTRIM(string)}

Returns string with leading space characters removed

\{fn REPLACE(string, search, replacement)}

Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed

\{fn REVERSE(string)}

Returns string with the order of the characters reversed

\{fn RIGHT(string, integer)}

Returns the rightmost length characters from string

\{fn RTRIM(string)}

Returns string with trailing space characters removed

\{fn SUBSTRING(string, offset, length)}

Returns a character string that consists of length characters from string starting at the offset position

\{fn UCASE(string)}

Returns a string in which all alphabetic characters in string have been converted to upper case

Not implemented:

  • \{fn CHAR(string)}

Date/time

Operator syntax Description

\{fn CURDATE()}

Equivalent to CURRENT_DATE

\{fn CURTIME()}

Equivalent to LOCALTIME

\{fn NOW()}

Equivalent to LOCALTIMESTAMP

\{fn YEAR(date)}

Equivalent to EXTRACT(YEAR FROM date). Returns an integer.

\{fn QUARTER(date)}

Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4.

\{fn MONTH(date)}

Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.

\{fn WEEK(date)}

Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.

\{fn DAYOFYEAR(date)}

Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

\{fn DAYOFMONTH(date)}

Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31.

\{fn DAYOFWEEK(date)}

Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7.

\{fn HOUR(date)}

Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23.

\{fn MINUTE(date)}

Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59.

\{fn SECOND(date)}

Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59.

\{fn TIMESTAMPADD(timeUnit, count, datetime)}

Adds an interval of count timeUnits to a datetime

\{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)}

Subtracts timestamp1 from timestamp2 and returns the result in timeUnits

System

Operator syntax Description

\{fn DATABASE()}

Equivalent to CURRENT_CATALOG

\{fn IFNULL(value1, value2)}

Returns value2 if value1 is null

\{fn USER()}

Equivalent to CURRENT_USER

Conversion

Operator syntax Description

\{fn CONVERT(value, type)}

Cast value into type

Aggregate functions

Syntax:

aggregateCall:
        agg( [ ALL | DISTINCT ] value [, value ]*)
        [ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
        [ FILTER (WHERE condition) ]
    |   agg(*) [ FILTER (WHERE condition) ]

where agg is one of the operators in the following table, or a user-defined aggregate function.

If FILTER is present, the aggregate function only considers rows for which condition evaluates to TRUE.

If DISTINCT is present, duplicate argument values are eliminated before being passed to the aggregate function.

If WITHIN GROUP is present, the aggregate function sorts the input rows according to the ORDER BY clause inside WITHIN GROUP before aggregating values. WITHIN GROUP is only allowed for hypothetical set functions (RANK, DENSE_RANK, PERCENT_RANK and CUME_DIST), inverse distribution functions (PERCENTILE_CONT and PERCENTILE_DISC) and collection functions (COLLECT and LISTAGG).

Operator syntax Description

COLLECT( [ ALL

DISTINCT ] value)

Returns a multiset of the values

LISTAGG( [ ALL

DISTINCT ] value [, separator])

Returns values concatenated into a string, delimited by separator (default ‘,’)

COUNT( [ ALL

DISTINCT ] value [, value ]*)

Returns the number of input rows for which value is not null (wholly not null if value is composite)

COUNT(*)

Returns the number of input rows

FUSION(multiset)

Returns the multiset union of multiset across all input values

APPROX_COUNT_DISTINCT(value [, value ]*)

Returns the approximate number of distinct values of value; the database is allowed to use an approximation but is not required to

AVG( [ ALL

DISTINCT ] numeric)

Returns the average (arithmetic mean) of numeric across all input values

SUM( [ ALL

DISTINCT ] numeric)

Returns the sum of numeric across all input values

MAX( [ ALL

DISTINCT ] value)

Returns the maximum value of value across all input values

MIN( [ ALL

DISTINCT ] value)

Returns the minimum value of value across all input values

ANY_VALUE( [ ALL

DISTINCT ] value)

Returns one of the values of value across all input values; this is NOT specified in the SQL standard

BIT_AND( [ ALL

DISTINCT ] value)

Returns the bitwise AND of all non-null input values, or null if none

BIT_OR( [ ALL

DISTINCT ] value)

Returns the bitwise OR of all non-null input values, or null if none

BIT_XOR( [ ALL

DISTINCT ] value)

Returns the bitwise XOR of all non-null input values, or null if none

STDDEV_POP( [ ALL

DISTINCT ] numeric)

Returns the population standard deviation of numeric across all input values

STDDEV_SAMP( [ ALL

DISTINCT ] numeric)

Returns the sample standard deviation of numeric across all input values

STDDEV( [ ALL

DISTINCT ] numeric)

Synonym for STDDEV_SAMP

VAR_POP( [ ALL

DISTINCT ] value)

Returns the population variance (square of the population standard deviation) of numeric across all input values

VAR_SAMP( [ ALL

DISTINCT ] numeric)

Returns the sample variance (square of the sample standard deviation) of numeric across all input values

COVAR_POP(numeric1, numeric2)

Returns the population covariance of the pair (numeric1, numeric2) across all input values

COVAR_SAMP(numeric1, numeric2)

Returns the sample covariance of the pair (numeric1, numeric2) across all input values

REGR_COUNT(numeric1, numeric2)

Returns the number of rows where both dependent and independent expressions are not null

REGR_SXX(numeric1, numeric2)

Returns the sum of squares of the dependent expression in a linear regression model

REGR_SYY(numeric1, numeric2)

Returns the sum of squares of the independent expression in a linear regression model

Not implemented:

  • REGR_AVGX(numeric1, numeric2)

  • REGR_AVGY(numeric1, numeric2)

  • REGR_INTERCEPT(numeric1, numeric2)

  • REGR_R2(numeric1, numeric2)

  • REGR_SLOPE(numeric1, numeric2)

  • REGR_SXY(numeric1, numeric2)

Window functions

Syntax:

windowedAggregateCall:
        agg( [ ALL | DISTINCT ] value [, value ]*)
        [ RESPECT NULLS | IGNORE NULLS ]
        [ WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) ]
        [ FILTER (WHERE condition) ]
        OVER window
    |   agg(*)
        [ FILTER (WHERE condition) ]
        OVER window

where agg is one of the operators in the following table, or a user-defined aggregate function.

DISTINCT, FILTER and WITHIN GROUP are as described for aggregate functions.

Operator syntax Description

COUNT(value [, value ]*) OVER window

Returns the number of rows in window for which value is not null (wholly not null if value is composite)

COUNT(*) OVER window

Returns the number of rows in window

AVG(numeric) OVER window

Returns the average (arithmetic mean) of numeric across all values in window

SUM(numeric) OVER window

Returns the sum of numeric across all values in window

MAX(value) OVER window

Returns the maximum value of value across all values in window

MIN(value) OVER window

Returns the minimum value of value across all values in window

RANK() OVER window

Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer

DENSE_RANK() OVER window

Returns the rank of the current row without gaps; this function counts peer groups

ROW_NUMBER() OVER window

Returns the number of the current row within its partition, counting from 1

FIRST_VALUE(value) OVER window

Returns value evaluated at the row that is the first row of the window frame

LAST_VALUE(value) OVER window

Returns value evaluated at the row that is the last row of the window frame

LEAD(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

LAG(value, offset, default) OVER window

Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL

NTH_VALUE(value, nth) OVER window

Returns value evaluated at the row that is the nth row of the window frame

NTILE(value) OVER window

Returns an integer ranging from 1 to value, dividing the partition as equally as possible

Note:

  • You may specify null treatment (IGNORE NULLS, RESPECT NULLS) for FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD and LAG functions. The syntax handled by the parser, but only RESPECT NULLS is implemented at runtime.

Not implemented:

  • COUNT(DISTINCT value [, value ]*) OVER window

  • APPROX_COUNT_DISTINCT(value [, value ]*) OVER window

  • PERCENT_RANK(value) OVER window

  • CUME_DIST(value) OVER window

Grouping functions

Operator syntax Description

GROUPING(expression [, expression ]*)

Returns a bit vector of the given grouping expressions

GROUP_ID()

Returns an integer that uniquely identifies the combination of grouping keys

GROUPING_ID(expression [, expression ]*)

Synonym for GROUPING

Grouped window functions

Grouped window functions occur in the GROUP BY clause and define a key value that represents a window containing several rows.

CTUMBLE

In streaming queries, TUMBLE assigns a window for each row of a relation based on a timestamp column. An assigned window is specified by its beginning and ending. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”. CTUMBLE function works in a similar way as TUMBLE does, but for normal tables.

Operator syntax Description

CTUMBLE(datetime COLUMN, interval [, time ])

Indicates a tumbling window of interval for datetime, optionally aligned at time

Here is an example:

SELECT CAST(CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000') AS TIMESTAMP) AS GTIMEST,
       keyid,
       SUM(metric1) AS SUMM1,
       SUM(metric2) AS SUMM2,
FROM TIMEMETRICS
WHERE keyid <= '1'
GROUP BY CTUMBLE(TIMEST, INTERVAL '60' MINUTE, TIMESTAMP '1970-01-01 01:01:00.000'), keyid
ORDER BY 1,2

In the query above you will be getting the addition of the metrics for every key and for every hour (60 minute) interval. The aligned at value is causing the intervals to start at minute 01

The result obtained would look similar to:

GTIMEST KEYID SUMM1 SUMM2

2020-09-27 17:01:00

0

10

2.5

2020-09-27 17:01:00

1

10

2.5

2020-09-27 18:01:00

0

20

5.0

2020-09-27 18:01:00

1

20

5.0

2020-09-27 19:01:00

0

30

7.5

2020-09-27 19:01:00

1

30

7.5

JSON Functions

In the following:

  • jsonValue is a character string containing a JSON value;

  • path is a character string containing a JSON path expression; mode flag strict or lax should be specified in the beginning of path.

Query Functions

Operator syntax Description

JSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR ) ON ERROR } )

Whether a jsonValue satisfies a search criterion described using JSON path expression path

JSON_VALUE(jsonValue, path [ RETURNING type ] [ { ERROR | NULL | DEFAULT expr } ON EMPTY ] [ { ERROR | NULL | DEFAULT expr } ON ERROR ] )

Extract an SQL scalar from a jsonValue using JSON path expression path

JSON_QUERY(jsonValue, path [ { WITHOUT [ ARRAY ] | WITH [ CONDITIONAL | UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON ERROR ] )

Extract a JSON object or JSON array from jsonValue using the path JSON path expression

Note:

  • The ON ERROR and ON EMPTY clauses define the fallback behavior of the function when an error is thrown or a null value is about to be returned.

  • The ARRAY WRAPPER clause defines how to represent a JSON array result in JSON_QUERY function. The following examples compare the wrapper behaviors.

Example Data:

{"a": "[1,2]", "b": [1,2], "c": "hi"}

Comparison:

Operator $.a $.b $.c

JSON_VALUE

[1, 2]

error

hi

JSON QUERY WITHOUT ARRAY WRAPPER

error

[1, 2]

error

JSON QUERY WITH UNCONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[ [1,2] ]

[ “hi” ]

JSON QUERY WITH CONDITIONAL ARRAY WRAPPER

[ “[1,2]” ]

[1,2]

[ “hi” ]

Not implemented:

  • JSON_TABLE

Constructor Functions

Operator syntax Description

JSON_OBJECT( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct JSON object using a series of key (name) value (value) pairs

JSON_OBJECTAGG( { [ KEY ] name VALUE value [ FORMAT JSON ] | name : value [ FORMAT JSON ] } [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON object using a key (name) value (value) pair

JSON_ARRAY( { value [ FORMAT JSON ] } * [ { NULL | ABSENT } ON NULL ] )

Construct a JSON array using a series of values (value)

JSON_ARRAYAGG( value [ FORMAT JSON ] [ ORDER BY orderItem [, orderItem ]* ] [ { NULL | ABSENT } ON NULL ] )

Aggregate function to construct a JSON array using a value (value)

Note:

  • The flag FORMAT JSON indicates the value is formatted as JSON character string. When FORMAT JSON is used, the value should be de-parse from JSON character string to a SQL structured value.

  • ON NULL clause defines how the JSON output represents null values. The default null behavior of JSON_OBJECT and JSON_OBJECTAGG is NULL ON NULL, and for JSON_ARRAY and JSON_ARRAYAGG it is ABSENT ON NULL.

  • If ORDER BY clause is provided, JSON_ARRAYAGG sorts the input rows into the specified order before performing aggregation.

Comparison Operators

Operator syntax Description

jsonValue IS JSON [ VALUE ]

Whether jsonValue is a JSON value

jsonValue IS NOT JSON [ VALUE ]

Whether jsonValue is not a JSON value

jsonValue IS JSON SCALAR

Whether jsonValue is a JSON scalar value

jsonValue IS NOT JSON SCALAR

Whether jsonValue is not a JSON scalar value

jsonValue IS JSON OBJECT

Whether jsonValue is a JSON object

jsonValue IS NOT JSON OBJECT

Whether jsonValue is not a JSON object

jsonValue IS JSON ARRAY

Whether jsonValue is a JSON array

jsonValue IS NOT JSON ARRAY

Whether jsonValue is not a JSON array

The following operators are not in the SQL standard.

Operator syntax Description

JSON_TYPE(jsonValue)

Returns a string value indicating the type of a jsonValue

JSON_DEPTH(jsonValue)

Returns an integer value indicating the depth of a jsonValue

JSON_PRETTY(jsonValue)

Returns a pretty-printing of jsonValue

JSON_LENGTH(jsonValue [, path ])

Returns a integer indicating the length of jsonValue

JSON_KEYS(jsonValue [, path ])

Returns a string indicating the keys of a JSON jsonValue

JSON_REMOVE(jsonValue, path[, path])

Removes data from jsonValue using a series of path expressions and returns the result

Note:

  • JSON_TYPE / JSON_DEPTH / JSON_PRETTY return null if the argument is null

  • JSON_LENGTH / JSON_KEYS / JSON_REMOVE return null if the first argument is null

  • JSON_TYPE generally returns an upper-case string flag indicating the type of the JSON input. Currently supported supported type flags are:

    • INTEGER

    • STRING

    • FLOAT

    • DOUBLE

    • LONG

    • BOOLEAN

    • DATE

    • OBJECT

    • ARRAY

    • NULL

  • JSON_DEPTH defines a JSON value’s depth as follows:

    • An empty array, empty object, or scalar value has depth 1;

    • A non-empty array containing only elements of depth 1 or non-empty object containing only member values of depth 1 has depth 2;

    • Otherwise, a JSON document has depth greater than 2.

  • JSON_LENGTH defines a JSON value’s length as follows:

    • A scalar value has length 1;

    • The length of array or object is the number of elements is contains.

Usage Examples:

JSON_TYPE example

SQL

SELECT JSON_TYPE(v) AS c1,
  JSON_TYPE(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
  JSON_TYPE(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
  JSON_TYPE(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

Result

c1 c2 c3 c4

OBJECT

ARRAY

INTEGER

BOOLEAN

JSON_DEPTH example

SQL

SELECT JSON_DEPTH(v) AS c1,
  JSON_DEPTH(JSON_VALUE(v, 'lax $.b' ERROR ON ERROR)) AS c2,
  JSON_DEPTH(JSON_VALUE(v, 'strict $.a[0]' ERROR ON ERROR)) AS c3,
  JSON_DEPTH(JSON_VALUE(v, 'strict $.a[1]' ERROR ON ERROR)) AS c4
FROM (VALUES ('{"a": [10, true],"b": "[10, true]"}')) AS t(v)
LIMIT 10;

Result

c1 c2 c3 c4

3

2

1

1

JSON_LENGTH example

SQL

SELECT JSON_LENGTH(v) AS c1,
  JSON_LENGTH(v, 'lax $.a') AS c2,
  JSON_LENGTH(v, 'strict $.a[0]') AS c3,
  JSON_LENGTH(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true]}')) AS t(v)
LIMIT 10;

Result

c1 c2 c3 c4

1

2

1

1

JSON_KEYS example

SQL

SELECT JSON_KEYS(v) AS c1,
  JSON_KEYS(v, 'lax $.a') AS c2,
  JSON_KEYS(v, 'lax $.b') AS c2,
  JSON_KEYS(v, 'strict $.a[0]') AS c3,
  JSON_KEYS(v, 'strict $.a[1]') AS c4
FROM (VALUES ('{"a": [10, true],"b": {"c": 30}}')) AS t(v)
LIMIT 10;

Result

c1 c2 c3 c4 c5

[“a”, “b”]

NULL

[“c”]

NULL

NULL

JSON_REMOVE example

SQL

SELECT JSON_REMOVE(v, '$[1]') AS c1
FROM (VALUES ('["a", ["b", "c"], "d"]')) AS t(v)
LIMIT 10;

Result

c1

[“a”, “d”]

Copying Data from one Table to another

Currently, the syntax CREATE TABLE …​ AS SELECT …​ is not allowed. If you want to copy data from one table to another, you first need to create the new table and then do INSERT/UPSERT …​ SELECT.

CREATE TABLE TDEST (f1_key BIGINT, f2_value VARBINARY, PRIMARY KEY(f1_key));

INSERT INTO TDEST SELECT * FROM TORIG;