Preparing the SQL statement
Preparing the SQL statement compiles it and caches its query plan on the database server.
So, when we call the execution of the prepare statement, it simply assigns values
to the parameters of the SQL statement and executes the final query plan,
avoiding SQL compilation and query plan optimization.
Here, we can see the new ingestData
method that prepares the statement just before
the main loop as part of the try header.
public void ingestData() {
try (LoansReader csv = new LoansReader();
Connection connection = utils.connect();
PreparedStatement prepStmt = connection.prepareStatement(utils.SQL_INSERT_PREPARED))
{
Loan loan;
while ((loan = csv.read()) != null) {
setParams(prepStmt, loan);
prepStmt.executeUpdate();
connection.commit();
stats.send(1);
}
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
We use an auxiliary method setParams
to set the params of each prepared statement:
private void setParams(PreparedStatement prepStmt, Loan loan) throws SQLException {
Object[] args = utils.getParams(loan);
for (int j = 0; j < args.length; j++) {
prepStmt.setObject(j + 1, args[j]);
}
}
Problem:
If we take a closer look at the code, we see that for each row insert, we pay the cost of doing a round trip from the application to the database server, that is, invoking the insert, executing the insert at the server, and replying the insert, and so on. Actually the invocation is actually expensive compared to the row insert itself executed at the server. In fact, if we look closely at the cost of the call, we are creating a message that requires us to create the message header (fixed cost) and then serialize the tuple and add it to the message (variable cost).
data:image/s3,"s3://crabby-images/0125e/0125ea2644adb3b6afe794b4a4397efdb7a54b7f" alt="Test 4 Error: Inserting Rows One by One"
Solution:
By resorting to batching, we can amortize the fixed cost of each call to the database server among multiple row insertions. Through batching, the fixed cost is paid once for a batch of inserts. So the fixed cost is divided by the number of inserts performed in the batch, thus reducing the overall cost.
data:image/s3,"s3://crabby-images/7e3b4/7e3b41b65c9ac5b49a38315bdef3f5b2a6f5a00f" alt="Test 5: Batching"