Data Definition Language (DDL) Syntax

The SQL Data Definition Language (DDL) provides users with the capability to create tables, indexes, sequences, views, online aggregates, as well as databases, users, roles, and permissions. These are the supported statements:

	| createDatabaseStatement
	| dropDatabaseStatement
	| createSchemaStatement
	| dropSchemaStatement
	| createTableStatement
	| dropTableStatement
	| truncateTableStatement
	| alterTableStatement
	| createIndexStatement
	| dropIndexStatement
	| createViewStatement
	| dropViewStatement
	| createSequenceStatement
	| dropSequenceStatement
	| createOnlineAggregate
	| dropOnlineAggregate
	| createUserStatement
	| dropUserStatement
	| grantStatement
	| revokeStatement
	| createRoleStatement
	| dropRoleStatement
	| createTriggerStatement
	| dropTriggerStatement
	| enableTriggerStatement
	| disableTriggerStatement


The CREATE DATABASE allows to create a new database:

	CREATE DATABASE databaseName [USER userName IDENTIFIED BY 'password']*

The user LXADMIN is administrator of all databases in the system.

The DROP DATABASE enables to remove a database:

	DROP DATABASE databaseName

1.1. Predefined Database, Schema, and Admin User

LeanXcale has a predefined database that is DB. LeanXcale has a predefined schema APP in all databases. The user LXADMIN is administrator of all schemas of all databases in the system.


The CREATE SCHEMA AUTHORIZATION enables to perform a batch of creation of tables and views, and grant permissions on a given schema:

		[ createAndGrantStatements ]*

	| createViewStatement
	| grantStatement

Note that the schema should exist, i.e., a user with the schema name, since the statement only performs the specified batch of creates and grants.

3. CREATE TABLE Statement

The CREATE TABLE command allows the creation of a table and follows the specified syntax as outlined below:

	[ IF NOT EXISTS ] name
	[ '(' tableElement [, tableElement ]* ')' ]
	[ AS queryStatement ] [ partitionBySpec [partitionBySpec]*  [ distributeBySpec  ]* ]

3.1. Table Types

The purpose of table creation is to define the structure of the table, including column names, attributes, and data types. There are three types of tables: regular, cache, and delta.

Regular tables are created using CREATE TABLE. Cache tables (CREATE CACHE TABLE) are always stored in memory, in addition to disk, to ensure rapid access. Delta tables (CREATE DELTA TABLE) include delta columns, allowing the accumulation of values with conflict-free updates, such as a sum.

3.2. IF NOT EXISTS Clause

Attempting to create a table that already exists results in an error. To avoid this error, the IF NOT EXISTS clause can be used, ensuring that the table is only created if it does not already exist.

3.3. Old Versions

LeanXcale as most modern databases, employs Multi-Version Concurrency Control (MVCC). Under MVCC, updates are not executed in place; instead, a fresh version of the tuple is generated with each update. This mechanism enables efficient concurrency control by avoiding read-write conflicts.

In specific scenarios, when it is known that the application exclusively performs either data insertion or data reading, but not both concurrently, it is possible to disable multi-versioning. This action is undertaken to remove the associated overhead and increase the efficiency of insert and update operations within the LeanXcale database.

3.4. Column Names, Types, Constraints, Online Aggregates

The list of tableElements is where it is specified the column names and any attributes associated to them. A tableElement can be either a column specification or a constraint specification as can be seen in the following syntax:

	| [ CONSTRAINT name ] tableConstraint

A tableColumn has the following syntax:

	columnName [kindOfDeltaColumn] type [ NOT NULL ] [ columnGenerator ] [ PRIMARY KEY ]

In its most fundamental form, a column specification consists of a column name followed by its data type. The inclusion of the modifier NOT NULL imposes a constraint, indicating that the column cannot contain NULL values. Additionally, the PRIMARY KEY modifier designates that the column is a part of the primary key for the table.

Columns can also be automatically generated in three distinct ways. The first method involves providing a default value to be used when no explicit value is provided. The second method employs an auto-incrementing value, achieved through the use of the AS IDENTITY clause. The third method utilizes a delta column, wherein updates and inserts are treated as an accumulation function over the current value of the column. The first two ways are indicated in the columnGenerator syntax:

	DEFAULT expression
	[ START WITH initialValue ]
	[ INCREMENT BY incrementValue ]

The AS IDENTITY clause indicates that is auto-increment key. It supports also the form GENERATED ALWAYS AS IDENTITY for compatibility with the PosgreSQL dialect. The START WITH subclause allows to specify the first value to be given to the auto-increment column. The INCREMENT BY subclause enables to indicate the value by which to increment the auto-increment column.

The third way is specified with the kindOfDeltaColumn syntax:

kindOfDeltaColumn =
	( SUM | MIN | MAX | COUNT )

The specification of an accumulation function determines the manner in which results are aggregate. For instance, when utilizing the SUM function, each update entails updating the SUM column with the previous value of the column augmented by the value being updated. MIN and MAX functions operate by obtaining the minimum or maximum, respectively, over the current value of the column and the new value introduced in the update. In the case of COUNT, it increments the previous value of the column by one, irrespective of the updated column value.

Additionally, various constraints can be specified over the columns. This is the constraint syntax:

	PRIMARY KEY '(' columnName [, columnName ]* ')'
	| PRIMARY GEOHASH KEY '(' latitudeColumnName , longitudeColumnName ')'
	| PRIMARY GEOHASH KEY '(' geoColumnName ')'
	| FOREIGN KEY '(' columnName [, columnName ]* ')'
	| REFERENCES tableName '(' columnName [, columnName ]* ')'
	| UNIQUE '(' columnName [, columnName ]* ')'
	| GEOHASH '(' latitudeColumnName , longitudeColumnName ')'
	| GEOHASH '(' geoColumnName ')'
	| CHECK expression

The PRIMARY KEY constraint defines the columns that constitute the primary key of a table and establishes the order in which they are arranged.

The PRIMARY GEOHASH KEY constraint defines a geohash primary key. It can be defined in two ways. The first way is providing two FLOAT or DOUBLE columns with the latitude and longitude. The second way is providing a geo column that is a STRING column using WKT markup language representing a vector geometry object. The GEOHASH constraint generates a hidden STRING column with the geohash of the indicated column(s). This hidden column is the one used as primary key.

The FOREIGN KEY constraint indicates that one or more columns within a table refer to the primary key in another table. Following the FOREIGN KEY keyword, the specific columns in the current table are listed. Subsequently, the REFERENCES keyword is used to identify the table name of the referenced table, along with the columns serving as primary keys in the referenced table.

The GEOHASH constraint indicates what columns to use to generate a hidden STRING geohash column that is used to generate a secondary index over that column.รง This geohash secondary index enables to improve geo searches such as a point is contained in a geometric shape. A geohash column can also be used as part of a composed primary key, for instance, PRIMARY KEY (country, geoHashCol). In the example, one can do searches with country and a geo condition, and thus, a search will find the first row of a country, and then perform a geo search within the country.

The UNIQUE constraint mandates that the specified column or set of columns consistently maintain distinct values across various rows in the table.

The CHECK constraint ensures that the stipulated condition is met for any values inserted into the associated column.

3.5. AS Clause

The AS clause is used to define the table schema based on the result set of a query statement. The table is then populated with the rows from the result set. When you specify AS clause, you may omit the list of tableElement_s since they will be taken from the resultSet, or you can provide the columnName and omit the data type from any _tableElement, in which case it renames the underlying column.

3.6. PARTITION BY Clause

The PARTITION BY clause plays a pivotal role in determining the fragmentation strategy of a table within LeanXcale, a critical factor for optimizing operational efficiency. If no partitioning is specified for a table, it means all the workload of the table will be handled by a single storage server and therefore, at most one core will be used to process that workload. If partitioning is specified, one can also indicate the criteria to distribute the partitions. The PARTITION BY clause has a crucial role in defining the fragmentation strategy employed by a table, representing a critical element for optimizing operational efficiency. When partitioning is explicitly specified, it becomes possible to define the criteria to distribute the partitions. The PARTITION BY clause syntax is as follows:

partitionByClause =
	KEY `(` columnName [, columnName]* `)` AT  tupleLiteral [, tupleLiteral]*
	| DIMENSION  columnName
		AT  literal [ , literal ]*
		| EVERY intervalLiteral
			[KEEP intervalLiteral]
		| EVERY unsignedIntegerLiteral
			[KEEP unsignedIntegerLiteral]
	| HASH `(` columnName [, columnName ]* `)`

	`(` literal [ , literal ]* `)`

LeanXcale supports several methods for table partitioning that can be used in isolation or in a combined way:

  • Prefix of Primary Key: The table can be partitioned based on a prefix of its primary key.

  • Dimension-based Partitioning: Partitioning can be performed using a specified dimension, typically another column in the table. Each PARTITION BY DIMENSION clause only has a single column. Each clause adds a new dimension to the partitioning. The column should be part of the primary key. The columns of the primary key that are dimensions used for partitioning should be a suffix of the primary key (never including the first column of the primary, since this can be achieved with PARTITION BY KEY).

  • Auto-partitioning on Dimension Columns of temporal or auto-increment nature.

  • Hash partitioning: It can be specified over one or more columns of the primary key. It computes a hash over the specified columns and the value is stored in a new hidden column named HASHID. This column becomes the last column of the table and the last column of the primary key. There are as many partitions as storage servers.

Auto-partitioning can be applied to dimension columns of type TIMESTAMP, TIME, DATE, or an auto-increment integer. This automatic partitioning is particularly crucial for historical tables where data spans over time, such as sales or time series. As tables, especially historical ones, grow in size, accessing them becomes increasingly slower. Auto-partitioning based on temporal or auto-increment fields ensures that table fragments remain manageable in size, optimizing access efficiency. In the context of historical tables, the field keeping the temporal column or auto-increment counter is the one that can be used for auto-partitioning. Since in historic tables, it is the last fragment the one being consistently accessed, while the other fragments are infrequently accessed, auto-partitioning ensures that only the actively used fragments consume resources, mitigating inefficiencies associated with accessing large historical datasets.

When using auto-partitioning with TIMESTAMP the time span should be specified with INTERVAL. Some examples are:





  • INTERVAL '90' DAY.






The utilization of the PARTITION BY KEY AT clause offers the capability to partition data based on a prefix of the primary key (including the whole key),

On the other hand, the PARTITION BY DIMENSION clause allows partitioning using any other column designated as a dimension in the CREATE TABLE statement, e.g., 'state'. This feature ensures that all rows with a specified range of values in the chosen dimension are grouped into the same partition.

Within the AT subclause, specific split points for partitioning are provided. In contrast, the EVERY subclause triggers automatic partitioning at specified intervals, streamlining the process of creating partitions based on the defined dimension.


If partitions have been specified, then it is possible to indicate how to distribute the partitions across the storage servers with the distribute by clause. It has the following syntax:


The distribution of data can be achieved through one several criteria in LeanXcale:

  • NO DISTRIBUTE. It does not distribute the table, so it will be stored at a single storage server that will handle all the workload for that table.

  • DISTRIBUTE BY KEY. Utilizes the primary key partitions to distribute across storage servers. Cannot be combined with other DISTRIBUTE BY clauses.

  • DISTRIBUTE BY DIMENSION. Uses the specified partitioned dimension column to distribute.

  • DISTRIBUTE BY HASH. Uses the specified hashing to distribute. Cannot be combined with other DISTRIBUTE BY clauses.

  • Default: If no DISTRIBUTE clause is specified, automatic distribution is used, i.e., the system decides how to distribute the partitions.

A new column, HASHID, will be appended to the primary key that will be the last column of the primary key. For insert operations, the column HASHID should be filled with 0 and the database will fill it with the right value. It will create as many partitions as the number of storage engines in the database. It should be noted that DISTRIBUTE BY HASH cannot be combined with other distribution methods. If HASH partitioning has been specified no other distribution criteria is allowed.

The HASH partitioning and distribution method is straightforward to specify as it doesn’t require knowledge of data distribution on column ranges. However, it comes with a significant tradeoff - reads become significantly more expensive since they must be performed across all partitions of the table. There is an overhead associated to compute the hash, so only add the columns really necessary for an even partitioning. Also using less columns than necessary will result in an skewed partitioning.

When using DISTRIBUTE BY KEY, it should be noted that the partitions made by primary key will be distributed among the storage servers. If there are less primary key partitions than storage servers, then only a subset of the storage servers will be used for the table. It is recommended to have as many partitions as storage servers or a multiple of them.

When using DISTRIBUTE BY DIMENSION, it happens the same as with DISTRIBUTE BY KEY. Only the partitions that have been defined will be distributed among the storage servers. If there less partitions than storage servers, only a subset of them will be used. Again it is recommended that the number of partitions used for distribution are the same as storage servers.

If NO DISTRIBUTION is indicated, it means that all the table fragments will remain on a single storage server. It should be noted that even with several storage servers, the common case, one can distribute tables across storage servers, so no distribution is need on a per table basis. Having said so, in most cases the most effective way to distribute the workload is distributing all tables across all storage servers, this is especially true for large tables over which large queries are performed and tables with high loads of any other kind, either ingestion or queries.

If partitioning is specified but no explicit distribution method is defined, automatic distribution will be performed based on the following criteria: If hash partitioning is defined it will be used as distribute criterion. Otherwise, it distributes the partitions across storage servers.

For more in-depth details, refer to the section on writing efficiently in LeanXcale.


The DROP TABLE statement facilitates the deletion of a specified table in SQL. It includes an optional IF EXISTS clause, which, when utilized, prevents the statement from generating an error if the table does not exist. The syntax is:


The TRUNCATE TABLE statement in SQL is used to quickly and efficiently remove all rows from a table. Unlike the DELETE statement, which removes rows one by one and generates individual row delete operations, TRUNCATE TABLE is a more direct and faster operation for removing all data from a table. It is syntax is:


5. ALTER TABLE Statement

The ALTER TABLE statement is used to modify the structure of an existing table. It allows you to add, modify, or drop columns, constraints, indexes, or perform other changes to the table definition. Its syntax is:

      ALTER TABLE name alterTableAction

	RENAME TO newName
	| DROP COLUMN columnName
	| DROP constraintName
	| ADD COLUMN columnName columnType
	| ADD [ CONSTRAINT name ] tableConstraint
		KEY `(` columnName [, columnName]* `)` AT tupleLiteral [, tupleLiteral]*
		| DIMENSION columnName AT literal [, literal]*


Online aggregates is a novel feature that LeanXcale provides. It can be thought as a materialized aggregation view with very efficient real-time update (it is updated as part of the transaction modifying the parent table). Basically, an online aggregate is a materialized aggregation query that is updated as part of the transaction updating the parent table, i.e., the table from which is derived. The syntax is as follows:

		[ IF NOT EXISTS ] [ NO OLD VERSIONS] aggregateName
		AS { * | projectItem [, projectItem ]* }
	FROM tableExpression
		[WHERE booleanExpression]
			[ GROUP BY { * | projectItem [, projectItem ]* } ]
				[ ( AUTOSPLIT columnName splitperiod [ AUTOREMOVE AFTER persistperiod ]
					| SPLIT columnName EVERY numericOrIntervalExpression [ AUTOREMOVE AFTER numericOrIntervalExpression ]

The optional "DURABLE" parameter serves to determine the persistence behavior of the online aggregate in relation to the parent table. When specified, the "DURABLE" attribute ensures that the online aggregate remains intact even if the parent table undergoes deletion. In contrast, when "DURABLE" is not explicitly indicated, the deletion of the parent table triggers a cascade deletion, leading to the removal of the associated online aggregate. The "IF NOT EXISTS" clause, analogous to its usage in the CREATE TABLE statement, does not result in an error if the online aggregate table already exists. This provision ensures that the creation process proceeds smoothly without interruption in the presence of a pre-existing online aggregate table. The "NO OLD VERSIONS" clause, akin to its counterpart in the CREATE TABLE statement, serves to deactivate multi-versioning. Consequently, only the latest version of each aggregate is retained, foregoing the retrieval of versions corresponding to transaction snapshots. The "AS" clause facilitates the specification of aggregates that the online aggregate will keep updated (such as SUMs, COUNTs, MINs, MAXs) and designates the columns from the parent table over which these aggregates are computed. Online aggregates, derived from a designated parent table indicated in the FROM clause, can be further filtered using the optional WHERE clause. This clause allows for setting a boolean condition, determining which rows from the parent table are considered by the online aggregate. The optional GROUP BY clause provides a means to specify the grouping criteria for computing the aggregates. If left unspecified, the aggregation encompasses all rows that satisfy the filter condition (if specified) or all rows if no filter condition is provided. Given that online aggregates are commonly defined over historical tables, using auto-splitting becomes imperative. The AUTOSPLIT clause may be specified over a temporal column or an auto-increment integer column. The optional AUTOREMOVE clause offers the ability to specify how data should be pruned after a certain period. It is should be noted that the intervals specified in "AUTOSPLIT" and "AUTOREMOVE" should align with those of the parent table or be a multiple thereof.

An online aggregate can be deleted with DROP ONLINE AGGREGATE:

	DROP ONLINE AGGREGATE onlineaggregateName


Secondary indexes play a pivotal role in optimizing access to columns that do not constitute a prefix of the primary key. The creation of these indexes is accomplished through the utilization of the CREATE INDEX statement, adhering to the subsequent syntax:

	ON tableName '(' columnName [, columnName ]* ')'
		[INCLUDE '(' columnName [, columnName ]* ')']
		[GLOBAL [ partitionBySpec [ distributeBySpec ] ]

The optional "UNIQUE" clause serves to specify that only a singular value can exist for each combination of columns within the index. This uniqueness constraint ensures that the indexed columns collectively have distinct values.

The index is delineated over a specific table, as denoted by the ON clause. Following the specification of the tableName, a sequence of columns constituting the secondary key is enumerated within parentheses. These columns define the structure of the secondary index, facilitating efficient retrieval of data based on the specified column combinations.

The INCLUDE clause allows to defined covered indexes, that is, indexes that include additional columns. This is useful for optimizing queries over the secondary index. Basically, it enable to recover the value of the columns without having to perform a search on the primary key.

The GLOBAL clause indicates that the index will be global. Without the GLOBAL clause, indexes are local to each table fragment. That is, each table fragment has a fragment for the secondary index that only contains secondary keys for the tuples stored in the associated fragment of the primary table. Secondary indexes are convenient because they enable to read the tuple within the same storage server. However, there is an involved tradeoff. When searching with a secondary key, all storage servers containing fragment of the table are sent the query, with the consequent overhead. Instead, global indexes are like a table and they can be stored on any storage server. In fact, one can specify the partitioning and distribution strategy as for regular tables. Searches in global indexes only impact the storage servers containing a fragment that has secondary keys in the searched range. So only the storage servers containing relevant information are contacted. However, after recovering the primary keys of the tuples satisfying the query, they have to be searched on the storage servers containing them. The best option is to combine the global index with and INCLUDE clause including all the columns that can be used by the query. In this way only the storage servers containing secondary keys in the searched interval are contacted and the query is solved without a second search step. Basically global covered indexes (GLOBAL plus INCLUDE) as vertical partitions in columnar data warehouses.

To eliminate an index, the DROP INDEX statement is employed, using the following syntax:


The statement begins with the DROP INDEX keywords, followed by an optional "IF EXISTS" clause, which ensures that the operation does not result in an error if the specified index does not exist. The indexName parameter designates the identifier of the index intended for removal. This process effectively dismantles the specified index, revoking its association with the corresponding table and freeing up resources.

8. CREATE VIEW and DROP VIEW Statements

A view is a virtual table derived from the result of a SELECT query. Unlike a physical table, a view does not store the data itself but provides a way to represent the result of a query as if it were a table. Views are useful for simplifying complex queries, abstracting the underlying data model, and controlling access to specific columns or rows of a table. It has the following syntax:

	CREATE VIEW [ IF NOT EXISTS ] viewName AS query

That creates a view named "viewName" from the result of the query provided in the AS clause.

A view can be removed by the DROP VIEW statement:

      DROP VIEW [ IF EXISTS ] viewName


A sequence is an object used to generate unique numeric values in a specified order. Sequences are commonly used to produce primary key values for tables, ensuring that each new record has a unique identifier. Unlike auto-increment columns, sequences provide greater flexibility and can be shared among multiple tables. Sequences use a BIGINT type (long integer). They are created by means of the CREATE SEQUENCE statement:

	CREATE SEQUENCE [ IF NOT EXISTS ] name sequenceModifier*

	AS type
		|   START WITH initialValue
		|   INCREMENT BY incrementValue
		|   CACHE cacheSize

The sequenceModifiers specify the behavior of the sequence. The START WITH clause indicates the first value to be given in the sequence. The INCREMENT BY clause sets the step size by which the sequence increases or decreases. The CACHE clause enables that each query engine instance keeps a stripe of cacheSize length to assign sequence identifiers. In this way, the query engine does not have to request to a storage engine the next value of the sequence every time. This is the preferred way of using sequences, with the CACHE option.

A sequence can be removed with the DROP SEQUENCE statement:


10. CREATE USER and DROP USER Statements

New users can be created by means of the CREATE USER statement:

	CREATE USER userName IDENTIFIED BY 'password'

Leanxcale has two predefine users: LXADMIN and NONE. Users are case sensitive. When the are written without quotes they are always stored as capitalized. When they are written between quotes the capitalization within the quotes is used. Note that in the connection string of the drivers (JDBC, Python, ODBC) the user names are always case sensitive.

A user can be removed by:

	DROP USER userName

11. GRANT and REVOKE Statements

Permissions can be granted by:

	GRANT permissionList TO userList
	permission [, permission]*

And revoked by:

	REVOKE permissionList TO userList

A permission has two parts:

	action databaseObject

The supported actions are:


The supported database objects are:


12. CREATE ROLE and DROP ROLE Statements

A role is a collection of permissions. In that sense they are similar to a user. The syntax is:

	CREATE ROLE roleName IDENTIFIED BY 'password'

Permissions can be granted to a role and a user can be granted roles that is equivalent to grant to the user each of the permissions in the role.

A role can be deleted with:

	DROP ROLE roleName


Triggers are a predefined action that is automatically executed ("triggered") in response to certain events on a particular table or view. These events can include data modification statements (such as INSERT, UPDATE, DELETE) or specific system events. They are created by means of CREATE TRIGGER with the following syntax:

	ON tableName
	FOR EACH ROW EXECUTE triggerFunction ['(' stringLiteral ')']
	[PRIORITY intLiteral]

BEFORE and AFTER are used to indicate whether the trigger has to be executed before the triggering event or after. INSERT, UPDATE, and DELETE specify the triggering event. The ON clause specifies on which table the trigger is defined. triggerFunction LeanXcale currently only supports row level triggers, that is, triggers associated to row events. For this reason, the only form allowed is FOR EACH ROW EXECUTE that determines that is a row level trigger. triggerFunction is the name of a Java function that contains the code of the trigger. PRIORITY indicates the priority of the trigger, in case, there are several defined, they are executed in priority order.

Triggers can be removed with DROP TRIGGER:

	DROP TRIGGER triggerName

Triggers can be enabled by means of:

	ENABLE TRIGGER triggerName ON tableName

And disabled with:

	DISABLE TRIGGER triggerName ON tableName