One area in which the MySQL stored program language really shines is the ease with which you can include SQL inside of stored program code (this was, after all, one of the key motivations for the introduction of this functionality in MySQL). While you might occasionally write stored programs without any SQL, it would be almost completely pointless to use stored programs if it weren for their ability to issue SQL.
Best practices related to SQL inside of MySQL stored programs are, therefore, among the most important in this chapter.
SQL-01: Start a transaction explicitly with the START TRANSACTION statement |
Although MySQL will automatically initiate a transaction on your behalf when you issue DML statements, you should issue an explicit START TRANSACTION statement in your program to mark the beginning of your transaction.
Its possible that your stored program might be run within a server in which autocommit is set to TRUE, and by issuing an explicit START TRANSACTION statement you ensure that autocommit does not remain enabled during your transaction. START TRANSACTION also aids readability by clearly delineating the scope of your transactional code.
SQL-02: Don leave transactions "dangling" |
Once you start a transaction, you should take responsibility for completing the transaction. Since transactions lock rows and potentially block other transactions, you need to ensure that transactions do not persist indefinitely. Generally, you should place the START TRANSACTION and COMMIT or ROLLBACK statements in the same stored program. This program may also call other programs, and you need to make sure that these called programs do not contain transactional code.
There are some exceptions to this recommendation. In particular, modular design might prompt you to break down a transaction into separate modules and control the overall transaction state from a master procedure.
SQL-03: Avoid use of savepointsthey can obscure program logic and reduce program efficiency |
Savepoints allow you to define a point within a transaction to which you can roll back without losing all of the changes made by the transaction. In essence, a savepoint facilitates the "partial rollback" of a transaction.
Indiscriminate use of savepoints can lead to inefficient and hard-to-maintain code. This is because when you roll back to a savepoint, your program flow is harder to follow, and you have almost by definition wasted system resources by issuing DML that you later aborted.
Quite often, you will find that instead of rolling back to a savepoint, you can simply issue a SELECT statement to validate an operation prior to actually issuing the DML. This technique was demonstrated in Chapter 8.
A valid use of a savepoint is within a stored program that you are using to execute a "nested" transaction without affecting the status of a transaction that may be in progress in the calling program. The "nested" program creates a savepoint and rolls back to that savepoint if any errors occur. In this way the procedure could be safely called by a program that has an open transaction, since any rollback issued in the nested program would affect only statements issued in that program.
SQL-04: Use an appropriate locking strategy |
There are two major patterns in transaction management: the optimistic locking strategy and the pessimistic locking strategy.
The pessimistic locking strategy assumes that concurrent updates are quite likely. To prevent this, the transaction locks rows as they are read. Other transactions that want to update the row must wait until the pessimistic transaction ends.
The optimistic locking strategy assumes that in the period of time between a user reading and then updating a row, it is unlikely that another user will attempt to update that same row. Of course, optimism in and of itself is not sufficient; when following this locking strategy, the program should check to ensure that the row has not been updated, immediately prior to the update. If the row has been updated, then the transaction is aborted.
Each locking strategy is based on assumptions regarding the behavior of other transactions or application users. Each has different implications for the duration of any locks acquired during the transaction and the possibility that a transaction will be aborted. Make sure that you weigh carefully the implications of the two strategies and pick the approach that best suits your application.
SQL-05: Keep transactions small |
The larger the transaction, the more likely it is that the transaction will lock rows needed by another transaction, and the greater the chance that a deadlock might occur. Transactions should therefore usually be no larger than is absolutely necessary.
SQL-06: Always reset the NOT FOUND variable after completing a cursor loop |
You should usually terminate a cursor loop when a CONTINUE handler for the NOT FOUND condition fires and modifies the value of a status variable. For instance, in the following fragment, the CONTINUE handler sets the v_last_row_fetched variable to 1, and we test this value after each FETCH call:
DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_last_row_fetched=1;
OPEN cursor1;
cursor_loop:LOOP
FETCH cursor1 INTO v_customer_name, v_contact_surname, v_contact_firstname;
IF v_last_row_fetched=1 THEN
LEAVE cursor_loop;
END IF;
-- Do something with the row fetched.
END LOOP cursor_loop;
CLOSE cursor1;
SET v_last_row_fetched=0;
It is important to reset this status value to 0 after the cursor loop terminates; otherwise, subsequent or nested cursor loops may terminate prematurely.
The following code incorrectly fetches employees for only a single department, because after the first cursor loop, the status variable continues to indicate that the last row has been fetched:
DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_not_found=1; SET v_dept_id=1; WHILE( v_dept_id<=10) DO OPEN dept_emp_csr; emp_loop:LOOP FETCH dept_emp_csr INTO v_employee_id; IF v_not_found THEN LEAVE emp_loop; END IF; CALL process_employee( v_employee_id); END LOOP; CLOSE dept_emp_csr; SET v_dept_id= v_dept_id+1; END WHILE;
SQL-07: Use SELECT FOR UPDATE when retrieving rows for later update |
Use the SELECT FOR UPDATE statement to request that locks be placed on all rows identified by the query. You should do this whenever you expect to change some or all of those rows, and you don want another session to change them out from under you. Any other session trying to update the rows, or lock the rows (perhaps using FOR UPDATE), will have to wait.
Here we are processing a special bonus payment for needy employees. We issue the FOR UPDATE clause so that the rows concerned are locked until our transaction completes:
CREATE PROCEDURE needy_bonus( ) BEGIN DECLARE v_employee_id INT; DECLARE v_salary NUMERIC(8,2); DECLARE v_last_emp INT DEFAULT 0; DECLARE emp_csr CURSOR FOR SELECT employee_id,salary FROM employees WHERE salary <45000 FOR UPDATE; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_last_emp=1; START TRANSACTION; OPEN emp_csr; emp_loop:LOOP FETCH emp_csr INTO v_employee_id, v_salary; IF v_last_emp THEN LEAVE emp_loop; END IF; CALL grant_raise( v_employee_id, v_salary); END LOOP emp_loop; CLOSE emp_csr; SET v_last_emp=0; COMMIT; END;
You can also use the LOCK IN SHARE MODE clause to lock the rows against update but continue to allow reads.
SQL-08: Avoid including SQL in functions that may be used in SQL |
You are free to include SQL statements within stored functions (with the exception of SQL statements that return result sets to the calling program). You should, however, be very wary of doing so if you think that your stored function might itself be called inside a SQL statement.
When you use a function that contains SQL in a SQL statement, you are effectively "nesting" two SQL statements. For every row returned by the "outer" SQL, you will have to execute the "inner" SQL. Such nested SQL statements can exhibit extremely unpredictable or undesirable performance.
For instance, consider the simple stored function below:
CREATE FUNCTION cust_contact_name (in_customer_id INT) RETURNS VARCHAR(100) READS SQL DATA BEGIN DECLARE v_contact_name VARCHAR(100); SELECT CONCAT(contact_firstname, ,contact_surname) INTO v_contact_name FROM customers WHERE customer_id=in_customer_id ; RETURN( v_contact_name); END$
It contains an efficient query, but nevertheless, if we include it in a query against the customers table as follows:
SELECT cust_contact_name(customer_id) FROM customers
our execution time is about five times greater than if we performed the same operation within the SQL itself:
SELECT CONCAT(contact_firstname, , contact_surname) FROM customers
The situation becomes even worse if the SQL inside the function is not completely optimized. In Chapter 10 we provide an example in which the use of a stored function inside a SQL statement lengthens execution time by a factor of 300!