Python

1. Two interfaces to access the same data

LeanXcale is a SQL database with NoSQL characteristics. As a relational key-value database, LeanXcale provides access to the same data through two independent APIs:
-
A 2003 ANSI SQL interface that is powerful and easy to use, but can be slow, so is most suitable for complex queries (e.g., multijoin queries).
-
A proprietary key-value interface, called KiVi, is fast and provides all functionality except for join operations, so it is most suitable for simpler queries where the overhead of SQL is too costly (e.g., updates and inserts).
Both interfaces interact with the same data and can be part of the same transaction.
2. SQL

2.1. SQLAlchemy Driver
SQLAlchemy is one of the most popular frameworks for database manipulation in Python. By using a custom dialect available from LeanXcale, shipped with the driver, we can leverage the SQLAlchemy toolkit alongside LeanXcale by interacting through our implementation of the well-known PEP 249, Python Database API Specification v2.0.
2.2. SQLAlchemy resources
The official documentation of SQLAlchemy provides a lot of information, so in this documentation, we offer examples of performing basic operations over a LeanXcale database.
2.3. SQLAlchemy Installation
This section includes the steps to install the Leanxcale SQLAlchemy driver, which is available in two flavours for installation as a wheel package using PIP or Easy Install to install the egg package. Download your preferred type for the Python SQLAlchemy driver from the Drivers page.
We recommend you work within a virtual environment to properly manage and avoid conflicts between the versions of the dependencies.
Two alternatives exist for managing the dependencies of managing yourself by downloading the required drivers from the Drivers page or you allowing PIP or Easy Install to manage the dependencies and install the driver using the LeanXcale public repository.
2.3.1. PIP Installation
Unmanaged Installation
If you decide to download the driver manually and install using pip3, then you must execute:
pip3 install pyLeanXcale-1.6.6-py3-none-linux_x86_64.whl
This command installs all necessary dependencies required for the driver.
Managed Installation
On the other hand, you can add the LeanXcale public repository to your PIP configuration so that it can take care of finding and installing the required dependencies. In your environment , create a pip.conf file with the following.
NOTE: You can know where to place this pip.conf file by running pip config list -v
[global]
index = https://pypi.python.org/pypi/
index-url=https://pypi.python.org/simple/
extra-index-url = https://nexus.leanxcale.com/repository/pypi-releases/simple/
Then, install the driver with:
pip3 install pyLeanxcale==1.6.6
2.3.2. Easy Install Installation
Unmanaged Installation
Similar to the unmanaged installation using PIP, the .egg LeanXcale Python driver can be downloaded manually and installed using Easy Install. For installing the driver using Easy Install, you must run:
easy_install pyLeanXcale-1.6.6-py3-none-linux_x86_64.egg
Managed Installation
If you use Easy Install to install the driver from the public repository, then you must create a setup.cfg file in your environment with the following:
[easy_install]
index-url = https://nexus.leanxcale.com/repository/pypi-releases/simple
Then, install by running:
easy_install pyLeanxcale==1.6.6
2.4. Quick Start
This section features a set of code snippets to connect, read, insert, and update information through the SQLAlchemy interface.
2.4.1. Connect and create an engine
Connecting to the SQL interface requires some basic information:
-
If you use LeanXcale as a Cloud Service, then you can find the IP or hostname for your instance in the LeanXcale Cloud Console.
-
If you use LeanXcale on-premise, then provide the information from your installation.
In our examples, we include these parameters:
-
IP / hostname:
123.45.67.89
-
Port:
1529
-
DB name:
dbtest
-
Username:
user1
-
Password:
pass4user1
The starting point for connecting SQLALchemy to LeanXcale is creating an engine, which uses a specific dialect to know how to interact with the underlying DBAPI.
from sqlalchemy import create_engine
leanxcale_IP = '123.45.67.89'
leanxcale_port = '1529'
database = 'dbtest'
user = 'user1'
pwd = 'pass4user1'
url = 'leanxcale://' + user + ':' + pwd + '@' + leanxcale_IP +':' + leanxcale_port +'/' + database
#Autocommit is by default True, can be set as a parameter
#Parellelism is by default False, can be set as a parameter
url = 'leanxcale://' + user + ':' + pwd + '@' + leanxcale_IP +':' + leanxcale_port +'/' + database + '?autocommit=False¶llel=True'
engine = create_engine(url)
To configure the engine, you must define the parameters for the connection URL by setting the username and password as well as the database to which you want to connect. Also, the IP address and port of the LeanXcale server must be configured.
The LeanXcale database can work in secure mode using TLS protocol, so if this feature is used, then you must specify the secure option when creating the engine as a connection argument, as in the following. By adding this option, your client must use HTTPS to offer secure communication.
engine = create_engine(url, , connect_args={'secure': True})
To close the connection, call the following:
engine.dispose()
2.4.2. Create a table
To create a table in LeanXcale, run the following code:
from sqlalchemy import MetaData, Table, Column, Integer, String
metadata = MetaData()
metadata.bind = engine
table = Table(
'test_table', metadata,
Column('id', Integer, primary_key=True),
Column('name', String)
)
# create all elements in metadata objects
metadata.create_all(engine)
2.4.3. Create an index
Several ways are available to create an index:
from sqlalchemy import Index
#index creation as a column parameter configuration
table_index_1 = Table(
'test_table_index_1', metadata,
Column('id', Integer, primary_key=True),
Column('name', String, index=True)
)
#index creation as a separate object when table creation
table_index_2 = Table(
'test_table_index_2', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Index('index_2_name', 'name')
)
#index creation after table creation
table_index_3 = Table(
'test_table_index_3', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
# create specific table
table_index_1.create(engine)
table_index_2.create(engine)
table_index_3.create(engine)
index_3 = Index('index_3_name', table_index_3.c.name)
index_3.create(engine)
2.4.4. Create a sequence
You can create sequences as standalone objects:
from sqlalchemy import Sequence
#assume an engine created and a metadata object binded to engine
seq = Sequence('test_sequence', metadata=metadata)
metadata.create_all(engine)
Then, you can obtain values from the created sequence:
nextid = engine.connect().execute(seq)
On the other hand, you can create table columns as sequences and have an implicit field autoincrement its value. The following code should be run to create a table with a sequence as the primary key, and then insert a new record:
from sqlalchemy import insert
table_sequence = Table(
'table_sequence', metadata,
Column('id', Integer, Sequence('table_sequence_seq'),
primary_key=True),
Column('name', String)
)
metadata.create_all(engine)
conn = engine.connect()
query = insert(table_sequence).values(name='John')
2.4.5. Insert, update and delete
Insert one record
from sqlalchemy import insert
conn = engine.connect()
query = insert(table).values(id=1, name='John')
query.compile()
conn.execute(query)
Insert many records
conn.execute(table.insert(), [
{'id': 2, 'name': 'Bob'},
{'id': 3, 'name': 'Alice'},
{'id': 4, 'name': 'Matt'}
])
2.4.6. Read and scan data
from sqlalchemy import select
sel = select([table.c.id, table.c.name]).where(table.c.id == 1)
sel.compile()
result = conn.execute(sel)
for row in result:
print(row)
2.4.7. Select query to join two tables
from sqlalchemy import join
# Equivalent join to
# SELECT *
# FROM students JOIN addresses ON students.id = addresses.st_id
j = students.join(addresses, students.c.id == addresses.c.st_id)
stmt = select([students, addresses]).select_from(j)
result = conn.execute(stmt)
result.fetchall()
for row in result:
print(row)
3. KiVi

3.1. Installation
3.1.1. Unmanaged installation
The Direct KiVi API Python library can be installed using pip. The wheel installer can be downloaded from the Drivers page.
Once downloaded, to install the wheel file, you just simply need to run in a terminal:
pip3 install pyKiVi-1.6.6-linux_x86_64.whl
3.1.2. Managed installation
On the other hand, you can add the LeanXcale public repository to your PIP configuration so that it can take care of finding and installing the required dependencies. In your environment, create a pip.conf file with the following.
NOTE: You can know where to place this pip.conf file by running pip config list -v
[global]
index = https://pypi.python.org/pypi/
index-url=https://pypi.python.org/simple/
extra-index-url = https://nexus.leanxcale.com/repository/pypi-releases/simple/
Then, install the Direct KiVi API with:
pip3 install pyKiVi
3.2. Configuration
Once the Direct KiVi API is installed, there is a step left to do. You need to add the Direct KiVi API installation directory to the LD_LIBRARY_PATH environment variable. To find out the installation directory you can use the pip show utility:
pip3 show pyKiVi
This will yield to an output similar to this one:
---
Metadata-Version: 2.1
Name: pyKiVi
Version: 1.7-SNAPSHOT
Summary: LeanXcale KiVi Direct Interface
Home-page: https://www.leanxcale.com/
Author: LeanXcale Team
Author-email: support@leanxcale.com
Installer: pip
License: Proprietary
Location: /home/ubuntu/temp/test-pykivi/lib/python3.5/site-packages
Requires:
Classifiers:
From that information, you need to take the location information and set it to the LD_LIBRARY_PATH environment variable. Please, note that you need to append the package name, pyKiVi, at the end of the location path. Something like the following:
export LD_LIBRARY_PATH="/home/ubuntu/temp/test-pykivi/lib/python3.5/site-packages/pyKiVi"
At this point, you are free to start using the LeanXcale Direct Python KiVi API.
3.3. Quick Start
The following sections include a set of code snippets to connect, read, insert, and update information through the Python KiVi direct API.
3.3.1. The representation of a record
In the python direct API, a record (or a row, or a tuple) is just represented as a list of values ordered according to the order of the position of the columns in the table.
For filtering, you can refer to a column by using TableObject.field(position) or by using TableObject.field("columnName"). In the code below, you will find all of these ways.
Be aware that the column names have to be written in capital letters. |
3.3.2. Connecting and starting a session
When you start a new session, it automatically connects. If this is the first session, then the API library manages the connections for you.
Depending on your configuration you may need to define a PROXY or a security certificate for your connection
from Session import Session, cTPLFMTIGN
from Database import Database
from Table import Table
from Filter import Filter, Op
import datetime
LXMETASERVER = "localhost"
PROXY = None
CERTIFICATE = None
SCHEMA = "APP"
session = Session("leanxcale://{}:14400/db".format(LXMETASERVER),
SCHEMA,
proxy=PROXY,
certificate=CERTIFICATE)
DB = session.DB()
In the following examples, the session creation is not included but is still needed to start any session. The import section is also needed for the rest of the examples below.
The examples are based on the following simple SQL schema:
CREATE TABLE person (
id BIGINT,
name VARCHAR,
"lastName" VARCHAR,
address VARCHAR,
phone VARCHAR,
email VARCHAR,
birthday DATE,
numChildren INT,
comment VARCHAR,
PRIMARY KEY (id)
);
CREATE INDEX PHONEIX ON person(phone);
CREATE SEQUENCE SEQ_PERSON_ID;
3.3.3. Inserting, updating and deleting
-
Insert one Record(or row or tuple):
TPerson = session.DB().getTable("person")
# Note the values for the fields are in the same order as the
# columns in the table
# Note you can set a NULL value for an unknown field (in this case ADDRESS)
TPerson.insert([10, "John2", "Doe", None, "555333695",
"johndoe@nowhere.no", datetime.date(1970, 1, 1),
0, "Nothing to comment"])
# Tuples are sent to the datastore when FLUSH is done
# For better performance it is recommended that you batch several
# inserts in the same flush
session.flush()
-
Insert using a Sequence:
TPerson = session.DB().getTable("person")
personId = session.DB().getSequence("SEQ_PERSON_ID")
# Note the values for the fields are in the same order as the
# columns in the table
# Note you can set a NULL value for an unknown field (in this case ADDRESS)
TPerson.upsert([personId.getAndIncrement(), "John", "Doe", None,
"555333695", "johndoe@nowhere.no",
datetime.date(1970, 1, 1), 0, "No comment either"])
session.flush()
-
Update a Record:
# The key should always be there
# The fields we are not changing should be informed as cTPLFMTIGN
TPerson = session.DB().getTable("person")
updatepersoninfo = [10, cTPLFMTIGN, cTPLFMTIGN, cTPLFMTIGN,
"anotherPhone", "another@email.no",
cTPLFMTIGN, cTPLFMTIGN, cTPLFMTIGN]
TPerson.update(updatepersoninfo)
session.flush()
-
Delete a Record by its Key:
TPerson = session.DB().getTable("person")
johnKey = [1]
TPerson.delete(johnKey)
session.flush()
-
Delete with a Filter. In the end, you need to do a SCAN and delete the records (we usually call them tuples though this is not the same concept as a tuple in python). Note how 2 sessions are used so delete can be flushed independently on the ongoing scan.
# Create a new Session for the deletes while doing the Scan
session2 = Session("leanxcale://{}:14400/db".format(LXMETASERVER),
SCHEMA)
# Simple filter
TPerson = session.DB().getTable("person")
TForDelete = session2.DB().getTable("person")
flt = Filter(TPerson).filter(
Filter.expr(Op.LT, TPerson.field("NUMCHILDREN"), 0))
cnt = 0
for tpl in TPerson.find().applyfilter(flt):
TForDelete.delete(tpl)
cnt += 1
if cnt % 1000:
session2.flush()
# Note that you couldn't be doing session.flush
# because session has an open scan
session2.flush()
session.discard()
3.3.4. Reading and scanning Data
Retrieving a data row directly by its Key:
TPerson = DB.getTable("person")
key = [ 0 ] # The Key just needs the value for id field
resulttuple = TPerson.getTuple(key)
Perform a simple SCAN with the basic options:
TPerson = session.DB().getTable("person")
flt = Filter(TPerson).pkrange([20],[30])
session.initBatch() # This is not needed, but recommended
for tpl in TPerson.find().applyfilter(flt):
processTuple(tpl)
# The same SCAN limited to the first 10 values. flt is released when the
# SCAN is closed. You have to create it again
flt = Filter(TPerson).pkrange([20],[30])
for tpl in TPerson.find().applyfilter(flt.first(10)):
processTuple(tpl)
-
Scan a table using a Secondary Index:
# This example assumes there is a index called PHONEIX on field PHONE
TPerson = session.DB().getTable("person")
flt = Filter(TPerson, index="PHONEIX").ixrange(["656434323"], ["676565454"])
for tpl in TPerson.find().applyfilter(flt):
processTuple(tpl)
-
Advanced Finding and Filtering:
The following examples demonstrate how to define a set of filters to be pushed to the KiVi datastore to obtain a result that matches specified conditions.
TPerson = session.DB().getTable("person")
# Fields can be referred by position (starting in 0) or name.
# Using position is better in terms of performance, but in a SCAN
# the difference should not be significant
# The following is equivalent to SQL WHERE NUMCHILDREN > 4 AND NAME = 'John'
flt = Filter(TPerson).filter(
Filter.expr(Op.AND,
Filter.expr(Op.GT,
TPerson.field(7), 4), # NUMCHILDREN > 4
Filter.expr(Op.EQ,
TPerson.field("NAME"), "John")))
for tpl in TPerson.find().applyfilter(flt):
processTuple(tpl)
# Now a example with Between
flt = Filter(TPerson).filter(
Filter.between(TPerson.field("BIRTHDAY"),
datetime.date(1900,1,1), datetime.date(2000,1,1)))
for tpl in TPerson.find().applyfilter(flt):
processTuple(tpl)
-
Project:
The concept of the projection is the set of fields from the table you want to obtain from the table. The project may also include operations over the fields to be retrieved.
TPerson = session.DB().getTable("person")
session.initBatch()
# Fields 1 and 6 are NAME and BIRTHDAY
flt = Filter(TPerson).project([1, TPerson.field("LASTNAME"), 6])
for tpl in TPerson.find().applyfilter(flt):
name = tpl[0]
lastname = tpl[1]
birthday = tpl[2]
processTuple(tpl)
session.flush()
-
Aggregations:
The most common simple aggregations can also be performed through the KiVi direct API. You must define an initial array with the fields to be used as the group by the key (if any), and then a list of aggregation expressions over some fields.
# Simple aggregation
# Equivalent to SQL: SELECT count(*) FROM PERSON
flt = Filter(TPerson).aggregate([], # Group by is empty
[ Filter.aggr(Op.ADD, 1) ])
for tpl in TPerson.find().applyfilter(flt):
pass
numPeople = tpl[0]
print("Count: {}".format(numPeople))
# Group by aggregation
# Equivalent to SELECT NAME, COUNT(*), SUM(NUMCHILDREN) .. GROUP BY NAME
flt = Filter(TPerson).aggregate([ TPerson.field("NAME") ],
[ Filter.aggr(Op.ADD, 1), Filter.aggr(Op.ADD, TPerson.field(7)) ])
for tpl in TPerson.find().applyfilter(flt):
name = tpl[0]
count = tpl[1]
sum = tpl[2]
processTuple(tpl)
session.discard()
# Adding a filtering condition before Group by aggregate
onedate = datetime.date(1970, 1, 1)
flt = Filter(TPerson).filter(
Filter.expr(Op.GT, TPerson.field("BIRTHDAY"), onedate)
).aggregate(
[ TPerson.field(2) ], # Group by NAME
[ Filter.aggr(Op.ADD, 1),
Filter.aggr(Op.MIN, TPerson.field("NUMCHILDREN"))])
for tpl in TPerson.find().applyfilter(flt):
processTuple(tpl)
3.4. Timestamps and dates
Across the examples above we have used datetime.date for the SQL date types. You can use datetime.datetime for timestamps. You can also use simple integer types given that you are aware of the fact that:
-
TIMESTAMP represents the epoch time in microseconds. time.time() * 1000000 will provide that value
-
DATE represents the number of days since 1900-01-01
-
TIME represents the time in seconds
3.5. Session management
The session defines the context in which operations are done and there are some basic operations that you need to manage. Specifically the following 4 are important:
session.beginBatch() This begins the session context for a set of operations. You cannot beginBatch without having flushed or discardes the previous. |
session.initBatch() This is very similar to beginBatch. The only difference is that with initBatch you can set some parameters so the batch is flushed automatically. session.initBatch(batchsize=1000) forces the a flush every 1000 insert/upsert/update/delete operations |
session.flush() Forces the operations in the batch to be sent to the datastores. A flush automatically begins a new batch. |
session.dicard() Discards any pending operation. This is usually used after the find() operations to clear resources. A discard automatically begins a new batch. |
3.6. Upsert vs Insert,Update
TableObject.upsert() is much more efficient than insert() or update() because insert or update will check for the tuple existence. In insert you want to be sure that the record doesn’t exist while in update you want to be sure that there is something to update.
However, most often the developer knows that the record cannot be there or will be there and upsert is the best choice. Consider that looking for the existence of keys implies another round trip and in batch operations it will raise an exception for the full batch when the flush() is done.
Any insert is equivalent to upsert() and also update() is equivalent by using the cTPLFMTIGN field value.
4. Other Examples
We feature additional detailed examples and real-world scenarios on our blog. To learn more about how you can connect your Scala application to LeanXcale using JDBC and the direct KiVi API, see:
-
Tweet Sentiment Analysis on a LeanXcale database. Explore how to perform analytical queries using SQL with Python, followed by integrating Natural Language Processing (NLP) techniques to perform sentiment analysis on tweets.