LeanXcale Concepts

In this summary we provide an overview of the basic concepts and some explanations of how LeanXcale deals with them.

1. ACID Transactions

A transaction is a set of one or more SQL statements that make up a logical unit of work that you can either commit or roll back and that will be recovered in the event of a system failure.

The acronym ACID is used to describe the four properties of an enterprise level transaction:

  • Atomicity - Atomicity means that all operations part of a transaction should be done or undone completely.

  • Consistency - A transaction should transform a system from one consistent state to another consistent state.

  • Isolation - Each transaction should happen independently of other transactions occurring at the same time.

  • Durability - Completed transactions should remain permanent, even when there is a system failure right after completion.

1.1. LeanXcale transaction management

LeanXcale transaction management is based on a patented radically new approach to transactional processing that is able to provide transactional data consistency while being able to scale out to large numbers.

1.2. Method

LeanXcale achieves transparent and scalable transaction processing obeying stringent response time requirements by introducing a range of innovations:

  • Decouples the visibility of updates from atomic commit. Any form of complex and time consuming coordination among components, such as a 2-phase commit protocol, is avoided. Instead, transactions can commit fast and in parallel without compromising consistency.

  • Decomposes and distributes the various transactional tasks, scaling them up individually.

  • Introduces asynchrony and message batching as much as possible. Asynchrony enables that most of the messing processing is outside the scope of transactions; batching keeps the overall number of messages down.

1.3. Concurrency

The LeanXcale concurrency control mechanism is based on snapshot isolation, for its advantages regarding conflict avoidance compared to locking and two-phase commit based approaches.

These benefits have been well established both for traditional relational database systems as well as transaction solutions on top NoSQL databases.

2. Atomicity

Atomicity means that all operations part of a transaction should be done or undone completely.

In the event of a failure before the transaction is finished, all operations and procedures should be undone, and all data should rollback to its previous state.

Atomicity states that database modifications must follow this “all or nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails.

It is critical that the database management system maintains the atomic nature of transactions in spite of any DBMS, operating system, or hardware failure.

2.1. LeanXcale atomicity

All transactions in LeanXcale are atomic. Any transaction will either be performed completely, or completely rolled back to original state.

3. Consistency

Consistency means that only valid data will be written to the database, according to the database configuration and rules.

Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways.

Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.

If a transaction is executed that violates the database’s consistency rules, the entire transaction will be rolled back, and the database will be restored to a consistent state.

When a transaction successfully executes, it changes take the database from one consistent state to another consistent state.

Consistency does not guarantee correctness of the transaction in every possible way, only that any programming errors cannot result in the violation of any defined database constraints.

3.1. LeanXcale Consistency modes

COMMIT has a cost and this is specially true if strong session consistency mode is activated.

LeanXcale has two session consistency modes:

  • Durable - resume control before commit is completed.

  • Strong session - check that the commit is fully completed.

3.2. Durable consistency

The default consistency mode is getting control back when the commit is The commit has to be durable and all checks done, but before the commit is guaranteed to be fully completed.

This provides more efficiency, but still predictable read data consistency for a session, while also offering low latency reads and lowest possible latency writes.

3.3. Strong session consistency

Strong session consistency mode also returns before the COMMIT is guaranteed to be fully done, but before starting any new transaction it checks (and waits if needed) that the previous COMMIT is fully completed.

Both session consistency modes provide guaranteed monotonic reads, and writes and ability to read your own writes.

4. Isolation

In database systems, isolation determines transaction integrity - how it is visible to other users and systems.

Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. A transaction against a database performing at the same time as anohter should be isolated from each other.

The database should always perform one entire transaction before the other, or in some other way prevent the two transactions from interfering with each other.

Isolation does not ensure which transaction will execute first — merely that transactions will not interfere with each other.

4.1. Low isolation

A lower isolation level increases the ability of many users to access the same data at the same time, but increases the number of concurrency effects (and possibly anomalies) users might encounter.

4.2. High isolation

Conversely, a higher isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another.

Isolation is typically defined at database level as a property that defines how/when the changes made by one operation become visible to other,

5. Snapshot Isolation

Snapshot Isolation is a method for transaction processing that ensures that all reads made in a transaction will see a the same snapshot of the database.

Transactions will only be performed if the updates it contains does not conflict with other updates being made since the snapshot was taken.

Snapshot is a type of multi-version concurrency control that builds a more relaxed isolation criterion that allows you to avoid using read locks. Thus, a transaction is never blocked performing a read operation potentially increasing the level of concurrency.

5.1. Process

Snapshot isolation works by creating a start and a commit timestamp for each transaction. When the transaction begins, it is assigned the start timestamp, this particular transaction will observe all versions up to start.

When the transaction ends it tries to commit and if this succeeds (there is no write conflict), it is assigned the commit timestamp.

When the transaction is committed, other transactions will be able to observe its modifications.

5.2. Benefits

Using Snapshot Isolation you get a strict isolation level and for a wide array of applications it is possible to achieve a serial execution or even to fully implement serializability,

All Snapshot Isolation histories can be mapped to single-valued histories while preserving data dependencies.

5.3. The marbles example

isolation

Snapshot does not guarantee serialization in itself, but this is actually the point:

Imagine that we have a bag containing a mixture of white and black marbles. Suppose that we want to run two transactions. One transaction turns each of the white marbles into black marbles. The second transaction turns each of the black marbles into white marbles. If we run these transactions under serializable isolation, we must run them one at a time. The first transaction will leave a bag with marbles of only one color. After that, the second transaction will change all of these marbles to the other color. There are only two possible outcomes: a bag with only white marbles or a bag with only black marbles.

If we run these transactions under snapshot isolation, there is a third outcome that is not possible under serializable isolation. Each transaction can simultaneously take a snapshot of the bag of marbles as it exists before we make any changes. Now one transaction finds the white marbles and turns them into black marbles. At the same time, the other transactions finds the black marbles - but only those marbles that where black when we took the snapshot - not those marbles that the first transaction changed to black - and turns them into white marbles. In the end, we still have a mixed bag of marbles with some white and some black. In fact, we have precisely switched each marble.

However, this is correct. Conceptually the two transactions are: Switch white marbles into black ones and switch black marbles into white ones so this is a expected outcome if both can work in parallel and that’s what snapshot isolation enables in contrast to serialization.

6. Anomalies

An anomaly is an inconsistency between different parts in the data. There are a number of different types with different causes.

Table 1. Anomalies

Anomaly

Description

Dirty Read

A Dirty Read occurs when one transaction reads a value that has been written by another still in-flight transaction. It does not matter if the value is finally committed or rolled back, it is still a dirty read.

Dirty Write

A Dirty Write occurs when one transaction overwrites a value that has previously been written by another still in-flight transaction.

Lost Update

Lost Update relates to concurrent reads and updates to data, in a system where readers do not block writers, so that writing is done based on reading data that is already being modified.

Non-Repeatable (Fuzzy) Read

Non-repeatable reads happen when a query returns data that would be different if the query were repeated within the same transaction. Non-repeatable reads can occur when other transactions are modifying data that a transaction is reading.

Phantom read

A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.

Read Skew

Read skew can occur when there are integrity constraints between two or more data items. When the transaction reads one item while the constrained item is modified, the transaction will see an inconsistent state, and generate the wrong output.

Write Skew

Write skew occurs when constrained items at least one safety feature for a system is disregarded, due to write-write synchronization problems, which causes Snapshot Isolation not to be serializable.

6.1. Write skew in detail

Consider two transactions, P and Q. P copies the value in a register x to y, and Q copies the value in a register y to x. There are only two serial executions of these two, P, Q or Q, P. In either, the end result is that x = y. However, Snapshot Isolation allows for another outcome:

  • Transaction 1 reads x

  • Transaction 2 reads y

  • Transaction 1 writes the value it read to y

  • Transaction 2 writes the value it read to x

In Snapshot Isolation this is still valid - both transaction were consistent and the writes didn’t overlap - but there is still an error: x and y have been swapped.

7. LeanXcale anomaly handling

The Snapshot Isolation criterion avoids all anomalies described in the standard.

For the write skew anomaly, LeanXcale provides specific SELECT FOR UPDATE semantics.

This means that when a SELECT is launched, all conflict managers are informed about rows involved in the SELECT.

Any other transactions are considered in conflict and are prevented from modifying these rows.

This prevents any write skew operation from happening.

7.1. LeanXcale isolation levels

LeanXcale supports two isolation levels: READ_COMMITTED, or Snapshot Isolation to provide read consistent view of the database to all transactions.

However, if you are used to READ_COMMITED, you may expect a different behavior when using snapshot isolation.

8. Durability

For a database durability guarantees that all committed transactions are stored permanently, no matter what happens to the platform running the database. Durability means that system crashes do not alter data.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

8.1. LeanXcale durability

In LeanXcale, durability is attained through a set of distributed Logger components acting independently or in groups. This guarantees that you have logged replicas in case a machine and/or disk breaks down.

9. Conflicts

In databases conflicts occur when transactions perform actions that compromise the integrity or reliability of the data. Generally they are the effect of concurrency - interleaved execution of transactions.

There are three basic categories of conflict:

  • Write–write - Transactions overwrite uncommitted data, basically getting in each other’s way, typically by one transaction starting a write, and a different transaction commits before the first transaction. The second transaction value is then lost.

  • Read–write - Reads can’t be repeated, because the value changes while the transaction waits, mostly because a transaction reads a value, and then reads it again to perform a modification, but a different transaciton has changed it, and the first transaction will have the wrong result.

  • Write-Read - The read is "dirty", because one transaction reads data written by other transactions before they are committed.

9.1. Coding for LeanXcale Conflict Management

Since conflicts are managed at the row level, conflicts are very unlikely (unless there are a lot of concurrent users and there is a very specific hot spot in the data model).

However, the developer has to consider conflicts. This means that you should always handle the Abort exception in any INSERT, UPDATE or COMMIT operation and act accordingly instead of relying on the database making the session wait until the lock is released from another session and aquired by your sessipn.

9.2. Example

An example: Let’s imagine you are building a web sales solution and have a screen where users do the final purchase reducing the stock. Next pseudo-code shows how to handle this situation with LeanXcale.

stockReserved = False
while not stockReserved:
    Start Transaction
    Check there is stock
    If there is no stock: Show a message for the customer and Return
    TRY:
        Reserve the stock
        COMMIT
        stockReserved = True
    EXCEPTION CONFLICT:
        #Someone was just trying to reserve the same item and there was a conflict
        #Although Conflict means Abort/Rollback, explictily do Rollback to restart
        #transaction and release resources to restart and keep trying to reserve
        #Stock unless there is no stock
        ROLLBACK
    EXCEPTION OTHERS:
        Manage other Exceptions
#END of while

Do the payment process

If everything is OK:
    stockUpdated = False
    while not stockUpdated:
        TRY:
            Start Transaction
            Update the stock
            Send the order
            COMMIT
            stockUpdated = True
        EXCEPTION CONFLICT:
            #Someone was just trying to reserve the same item and there was a conflict
            #Do nothing keep trying to reserve Stock unless there is no stock
            ROLLBACK
    #END while

else #(something failed in the payment proccess):
    #Undo stock reservation:
    stockUpdated = False
    while not stockUpdated:
        TRY:
            Start Transaction
            Update the stock undoing reservation
            COMMIT
            stockUpdated = True
        EXCEPTION CONFLICT:
            #Someone was just trying to reserve the same item and there was a conflict
            #Do nothing keep trying to reserve Stock unless there is no stock
            ROLLBACK
    #END while
    COMMIT
#END if

Since in SQL transactions don’t need to be explicitly started because the start is implicit when you connect or you finish (with COMMIT or ROLLBACK) the previos transaction, the first part in Java using JDBC would be:

Also, using SQL and JDBC, the Conflict is not a Explicit Exception, but a state of an SQLException:

boolean stockReserved = false
do {
    try {
        int nstock = getStock(connection, Item);
        if (nstock == 0) {
            throw new NoStockException("No Stock for " + Item);
        }
        updateReservedStock(connection, Item, nstock - 1);
        CommitTransaction(connecction);
        stockReserved = true;

    } catch (SQLException sqlex) {

        if (sqlex.getSQLState()!=null && Arrays.binarySearch(CONFLICT_STATE,sqlex.getSQLState()) >= 0){
            RollbackTransaction(connection);
        }
        else {
            throw sqlex;
        }
    }
}
while (stockReserved == false);

10. LeanXcale Commit

LeanXcale does not use two-phase commit, but a novel and patented distributed approach to all the operations related with commit. At a very high level, we differentiate three stages in the commit process:

  • Durable: The information of the transaction is logged and the commit is guaranteed to be recoverable

  • Persisted: The information related to the transaction is persisted in the datastores

  • Visible: The information is available to be seen by any process accessing it

This division allows to distribute all transaction among scalable components, because you can scale the isolation of writes and the isolation of writes in a separable way.

When the data is both Persisted and Visible it is Readable for all other transactions.

11. Defer checks

As with conflicts, there are a set of checks that may be needed in statements and that require round trips.

A typical case is INSERT statements fail if there is another row with the same PRIMARY KEY or any UNIQUE INDEX already exists. On programatic operations this means an overhead that you may not want to pay at each operation because those checks can be defered to the point when you can batch several operations or do the COMMIT.

11.1. LeanXcale behavior

The default behavior for LeanXcale is to defer checks.

The main drawback when deferring checks is that it is difficult to know which specific operation checks are failing and, if several are failing you just have one input.

If you need detailed information, you can configure transactions to perform checks synchronously.

12. Conflict management

Conflict management relates to the way the database manages write-to-write conflicts.

A typical way to handle conflicts is to use locks, preventing one action to be performed while something else is happening.

12.1. Long and short locks:

Locks are used to ensure that transactions are well-formed. Locks can have different durations:

  • Long - locked until the end of the transaction

  • Short - locked until the action is completed

A transaction is considered well-formed if both writes and reads request a lock before performing the action on a data item or set of data items.

12.2. Lock phases

The standard method of doing transactions involves a two-phase method:

  • Expanding: aquire locks, do not release any locks

  • Shrinking: release locks, do not acquire any new locks

12.3. Lock types

Locks are normally a derivation of on of two types:

Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.

Two major types of locks are utilized:

  • Exclusive/Write - used when the transaction before modifying the item

  • Shared/Read - use when the transactions before reading the item

12.4. LeanXcale Conflict Management

LeanXcale manages conflicts at the row level. When two concurrent transactions try to modify the same row, one of the transactions is aborted.

This behavior is different to a lot of other transactional databases that typically lock the rows, forcing other sessions to wait until the lock is released.

12.4.1. Asynchronous / synchronous management

In LeanXcale conflicts can be managed either synchronous or asynchronous.

By default, LeanXcale manages conflicts asynchronously, which can yield a significant improvement in a distributed system.

This means that for a set of DML operations that are part of a transaction you are guaranteed that conflicts are resolved before COMMIT, but not when the SQL statements returns.

The main drawback in asynchronous conflict check is when there are a set of operations and you want to know which operation had the conflict.

If you need detailed information, you can configure transactions to perform conflict management synchronously.

12.4.2. Configuration

The type of conflict management to use is configured in the connection, by selecting one of the connection modes.

For more information see the connection modes section.

13. Visibility

The term visibility refers to when effects of transactions are available to other transactions and processes.

13.1. LeanXcale visibility

In LeanXcale, the visibility of updates is not connected to the atomic commits.

Any form of complex and time consuming coordination among components, such as a 2-phase commit protocol, is avoided.

Instead, transactions can commit fast and in parallel without compromising consistency.

14. Serializability

With using concurrency, the term serializability is used to describe equivalence between outcomes.

The concurrent set of transactions are called transaction schedule or transaction history. The same set of transactions can be done concurrently, or in sequence.

If the outcome of the transactions is the same in either case, the transactions are concurrent transaction schedule is considered serializable.

14.1. Purpose

The goal of serializability is to keep the data in the data item in a consistent state, and is considered a strong criterion for the correctness of a concurrent transaction schedule.

Schedules that are not serializable are likely to generate erroneous outcomes.

The idea is that if one transaction is correct by itself (and order is irrelevant), then a schedule that consists of a sequence of these transactions is correct. Therefore, if a concurrent schedule is equivalent, then it is also correct.

15. Connections

15.1. LeanXcale Connection Modes

When starting any connection you can define the connection mode. Basically, there are 3 connection modes:

  • RUN - Program mode

  • MANUAL - Interactive mode

  • STRONG - Session Mode

By default, all the connections will be MANUAL. If a specific client wants to connect with using a different connection mode, the client should use the property MODE with one of the accepted values (RUN, MANUAL or STRONG). This property is accepted only on JDBC connections. For other Query Engine connectors, you can just modify the default value. The defalut value can be modified in the LeanXcale configuration file. To modify the default value, please contact us on LeanXcale support team email

15.1.1. RUN/Program Mode

  • The isolation level is Snapshot Isolation

  • Provides default session consistency

  • Conflicts are resolved asynchronously

  • Checks are deferred and done asynchronously

This is the recommended mode for any program.

15.1.2. Manual/Interactive Mode

  • The isolation level is READ_COMMITTED

  • Provides default session consistency

  • Conflicts are resolved synchronously for each operation

  • Checks are done synchronously for each operation

This is a mode suited for a user connecting an interactive session so any operation gives back any possible conflict/error as soon as possible. It is not designed for performance because the overhead of resolving synchronously for operations.

If the user is running a big SQL scripts probably the first times it is better to use this mode despite other modes can provide higher performance.

15.1.3. Strong/Session Mode

  • The isolation level is Snapshot Isolation

  • Provides strong session consistency

  • Conflicts are resolved asynchronously

  • Checks are deferred and done asynchronously

16. Conflicts

In databases conflicts occur when transactions perform actions that compromise the integrity or reliability of the data. Generally they are the effect of concurrency - interleaved execution of transactions.

There are three basic categories of conflict:

  • Write–write - Transactions overwrite uncommitted data, basically getting in each other’s way, typically by one transaction starting a write, and a different transaction commits before the first transaction. The second transaction value is then lost.

  • Read–write - Reads can’t be repeated, because the value changes while the transaction waits, mostly because a transaction reads a value, and then reads it again to perform a modification, but a different transaciton has changed it, and the first transaction will have the wrong result.

  • Write-Read - The read is "dirty", because one transaction reads data written by other transactions before they are committed.

16.1. LeanXcale Conflict Management

LeanXcale manages conflicts at the row level. When two concurrent transactions try to modify the same row, one of the transactions is aborted.

This behavior is different to a lot of other transactional databases that typically lock the rows, forcing other sessions to wait until the lock is released.

16.1.1. Asynchronous / synchronous management

In LeanXcale conflicts can be managed either synchronous or asynchronous.

By default, LeanXcale manages conflicts asynchronously, which can yield a significant improvement in a distributed system.

This means that for a set of DML operations that are part of a transaction you are guaranteed that conflicts are resolved before COMMIT, but not when the SQL statements returns.

The main drawback in asynchronous conflict check is when there are a set of operations and you want to know which operation had the conflict.

If you need detailed information, you can configure transactions to perform conflict management synchronously.

16.1.2. Configuration

The type of conflict management to use is configured in the connection, by selecting one of the connection modes.

For more information see the connection modes section.

17. Readability

In the context of transactions, Readability is a property or a stage of the COMMIT. It states when the data affected by the transaction is readable by other transactions. in essence, the readability state indicates that the data has been transacte and can be relied on for some purpose.

17.1. LeanXcale Readability

LeanXcale uses a readability concept that consists of two parts: persistency and visibility.

When the data is both Persisted and Visible, it is Readable for all other transactions.

This division allows to distribute all transaction among scalable components, because you can scale the isolation of writes and the isolation of reads in a separable way.

For more details, see the section about conflict management.