User-Defined Functions | Mastering MySQL 4

You would mostly write UDFs in C or C++. You can use them with binary or source distributions that have been configured with --with-mysqld-ldflags=-rdynamic. Once added, UDFs are always available when the server restarts, unless the --skip-grant-tables option is used.

There are two kinds of UDFs: standard and aggregate. Standard UDFs are like ordinary built-in functions such as POW() and SIN() and work on a single row of data, and aggregate functions are similar to the built-in SUM() and AVG() functions that work with groups.

To implement a UDF you need to do the following:

Write the C or C++ functions (you can use other languages as long as you can compile them into a native code shared library).
Compile and install the library.
Load the UDF into MySQL.

Once a UDF is added, details are stored in the func table in the mysql database. The func table looks as follows:

mysql> SHOW COLUMNS FROM func; +-------+------------------------------+------+-----+----------+-------+ | Field | Type                         | Null | Key | Default  | Extra | +-------+------------------------------+------+-----+----------+-------+ | name  | char(64) binary              |      | PRI |          |       | | ret   | tinyint(1)                   |      |     | 0        |       | | dl    | char(128)                    |      |     |          |       | | type  | enum('function','aggregate') |      |     | function |       | +-------+------------------------------+------+-----+----------+-------+ 4 rows in set (0.00 sec)

The name field contains the name of the UDF (and the name of the main C/C++ function). The ret field indicates whether the UDF can return nulls, the dl field indicates the name of the library containing the UDF (many UDFs can be bundled into one library), and the type field indicates whether the UDF is a standard or aggregate UDF.

Note

If you've upgraded from an older version of MySQL, you may not have all the columns. Run the mysql_fix_privilege_tables script (in the bin directory) to add the missing columns to your table.

In this section, you'll first look at compiling and installing the sample UDFs that come with a MySQL distribution and then go on to writing your own. Before you get started, create and add records to a small table, which you'll use to test the UDFs once you've added them:

mysql> USE firstdb; Database changed mysql> CREATE TABLE words (id tinyint(4), word varchar(50)); Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO words VALUES (1,'aeiou'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (2,'bro'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (3,'so'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (4,'kisso'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (5,'lassoo'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (2,'bro'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (3,'so'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (4,'kisso'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (4,'kisso'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO words VALUES (5,'lassoo'); Query OK, 1 row affected (0.00 sec)

MySQL comes with five sample UDFs, bundled in the file udf_example.cc, usually stored in the mysql/sql directory of a source distribution. You'll need to compile this file as a sharable object file. On Unix systems, these files usually have a .so extension, and on Windows the extension is usually .dll. The command (on Unix) you'll use will be something like this:

% gcc -shared -o udf_example.so udf_example.cc

This book is not a guide to programming and compiling, so although the topics are explained reasonably thoroughly, it cannot cover all possible combinations. You may need to be experienced or get some assistance in those areas to get the most out of this chapter.

To find the correct compiler options for your system, you can use the make utility, which checks for dependencies. Each system will differ, but after running make you may get output something like this:

% make udf_example.o g++ -DMYSQL_SERVER -DDEFAULT_MYSQL_HOME="\"/usr/local\""  -DDATADIR="\"/usr/local/var\"" -DSHAREDIR="\"/usr/local/share/mysql\"" -DHAVE_CONFIG_H -I../innobase/include -I./../include -I./../regex  -I. -I../include -I. -O3 -DDBUG_OFF   -fno-implicit-templates  -fno-exceptions -fno-rtti -c udf_example.cc

Take the options supplied, and use them to compile the UDF. Again, your system may differ—some will require the -c option to be left out, others will need it. You should know your system well enough, know someone who does, or have enough patience to try again if your first (and second and third…) attempts result in a dead end. Your final command may look something like this:

% gcc -shared -o udf_example.so udf_example.cc -I../innobase/include–  -I./../include -I./../regex -I. -I../include -I.

Note

On some systems you'll have to do this in two steps: first compile udf_example.cc as udf_example.o, then create the shared library from udf_example.o (using gcc -shared -o udf_ example.so udf_example.o).

Once you've compiled the UDF, move it to the place your shared libraries usually go. With Unix systems, it's any directory searched by ld (mostly /usr/lib or /lib), or you can set an environment variable to point to the directory you're storing your library. Typing man dlopen will give you the name of the environment variable (usually LD_LIBRARY or LD_LIBRARY_ PATH). You would set this in your startup script (mysql.server or mysqld_safe). With Windows systems, you'll usually place the UDF in the WINDOWS\System32 or WINNT\System32 directory. Copy your compiled file to the appropriate location, for example:

% cp udf_example.so /usr/lib

Once you have placed the file, some systems may require you to create the necessary links (such as by running ldconfig) or restarting MySQL before you can load the function.

To load the UDF from the MySQL command line, use the CREATE FUNCTION statement. The syntax is as follows:

CREATE [AGGREGATE] FUNCTION function_name RETURNS {STRING|REAL|INTEGER}        SONAME shared_library_name

The example comes with a number of UDFs (you can bundle more than one in a single library). For now, you'll load just three of the functions, as follows:

mysql> CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.so"; Query OK, 0 rows affected (0.03 sec) mysql> CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.so"; Query OK, 0 rows affected (0.00 sec) mysql> CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "udf_example.so";   Query OK, 0 rows affected (0.00 sec)

Now you can test the new UDF. To see what all the UDFs do, you should look at the udf_ example.cc file. The metaphon UDF (the correct name for the algorithm is actually metaphone) takes a string and returns a result based on the way the string sounds. It is similar to the more well-known soundex algorithm, but tuned more for English.

mysql> SELECT METAPHON(word) FROM words; +----------------+ | METAPHON(word) | +----------------+ | E              | | BR             | | S              | | KS             | | LS             | | BR             | | S              | | KS             | | KS             | | LS             | +----------------+ 10 rows in set (0.00 sec)

This is not particularly useful, but at least now you know how to add functions. Test that the aggregate function is working by using this code:

mysql> SELECT AVGCOST(id,1.5) FROM words; +-----------------+ | AVGCOST(id,1.5) | +-----------------+ |          1.5000 | +-----------------+ 1 row in set (0.00 sec)

There is only one result, as the function works with a group. You didn't use a GROUP BY clause, so the entire result set is taken as a single group. If you grouped by the contents of id, you'd get five results, as there are five unique id values:

mysql> SELECT id,AVGCOST(id,1.5) FROM words GROUP BY id; +------+-----------------+ | id   | AVGCOST(id,1.5) | +------+-----------------+ |    1 |          1.5000 | |    2 |          1.5000 | |    3 |          1.5000 | |    4 |          1.5000 | |    5 |          1.5000 | +------+-----------------+

You can drop a UDF with the DROP FUNCTION statement, for example:

mysql> DROP FUNCTION myfunc_double; Query OK, 0 rows affected (0.01 sec)

You can view the list of available UDFs by looking at the contents of the func table in the mysql database, as follows:

mysql> SELECT * FROM mysql.func; +----------+-----+----------------+-----------+ | name     | ret | dl             | type      | +----------+-----+----------------+-----------+ | metaphon |   0 | udf_example.so | function  | | avgcost  |   1 | udf_example.so | aggregate | +----------+-----+----------------+-----------+ 2 rows in set (0.01 sec)

This implies that the user adding or removing the function needs to have INSERT or DELETE permission for the func table, or mysql database. You would usually only give this to an administrator, as besides the security risk of access to the mysql database, a UDF can potentially cause a lot of harm.

Now let's create a UDF from scratch. You're first going to create a standard (not aggregate) UDF, called count_vowels.

Standard UDFs

A standard UDF has one main function, which is named the same as the UDF and is required, and two optional functions, which are named similarly but with _init and _deinit appended to the end. All these functions must be in the same library.

The Init Function

The init function is the initialization function, called once at the beginning of processing the UDF. It checks the arguments passed to the UDF (for example, whether they're the right type or number) and specifies details about the result (whether it can be NULL, how many decimal places it has, and so on).

It is declared as follows:

my_bool function_name_init(UDF_INIT *initid, UDF_ARGS *args, char *message);

The function returns a Boolean type, which you set to false if the function did not pick up any errors or true if an error was spotted.

The initd Parameter

The initd parameter is the UDF's main data structure. It is passed to all three functions. Any changes to the default settings are made in this function. The structure contains the following members:

my_bool maybe_null This is a Boolean that specifies whether the UDF can return a NULL value (if set to true) or not (if set to false). It's set to false by default, unless any of the function's arguments can be NULL.

unsigned int decimals Specifies the maximum number of decimals that can be returned. By default it takes the most number of decimals passed into the main function by any of the arguments. So, if 203.2, 219.12, and 341.456 are passed, decimals would default to 3 (based on the .456 of the last argument). You can set a maximum limit in the init function, though.

unsigned int max_length Specifies the maximum length of the returned result. For string UDFs, the default is the length of the longest string argument. For integers, the default is 21 characters (including the sign). For reals, the default is 13 (including the sign and decimal point) plus the number of decimals.

char *ptr This is a pointer that the UDF can use—for example, to pass data across all three functions. Allocate the memory in the init function if the pointer is used for new data.

The args Parameter

The second parameter, args, is a structure containing the arguments passed in from the query. It contains the following members:

unsigned int arg_count  Contains the number of arguments passed in from the query. If your UDF takes a set number of arguments, check this value for error handling.

enum Item_result *arg_type  Contains an array of types. Each element corresponds to one of the arguments, so the total number of elements is the same as the value of arg_count. The possible types are STRING_RESULT, INT_RESULT, and REAL_RESULT. Use this for error checking, or cast the argument into the specific type you require.

char **args  Contains an array of the actual arguments passed from the query. If the argument is constant, it can be accessed as args->args[i], where i is the element number of the argument. For a nonconstant args->args[i] is 0, as the actual value from the row is passed to the main function. This is discussed further in "The Main Function."

unsigned long *lengths An array containing the maximum possible string length for each argument passed by the query. It differs in the main function, so see the discussion in "The Main Function."

The message Parameter

The message parameter contains a character pointer, which is used for any error messages that arise during the initialization. It should always be given a value when the init function returns true, indicating an error. The default character buffer is 200 bytes, but you'd usually want a much shorter error message than this (80 characters is the width of a standard terminal). You must also terminate it with a null byte.

The Main Function

The main function is the only one required for a standard UDF, and it is called once for each row returned from the query. The return value from this function is the same as the return value for the whole UDF and can be a string, a real, or an integer. The function should be declared as one of the following corresponding to its return value.

If the UDF returns a string:

char *function_name(UDF_INIT *initid, UDF_ARGS *args, char *result,  unsigned long *length, char *is_null, char *error);

If the UDF returns a real:

double function_name(UDF_INIT *initid, UDF_ARGS *args,  char *is_null, char *error);

If the UDF returns an integer:

long long function_name(UDF_INIT *initid, UDF_ARGS *args,  char *is_null, char *error);

For numeric types, the return value of the main function is simply the value. If it's a string type, the return value is a pointer to the result, with the length stored in the length argument. The result buffer is defaulted to 255 bytes, so if the result is less than this, the pointer should be the result pointer passed into the main function. If it's more, it should be the pointer allocated in the init function (you'll need to allocate space with malloc() and then later deallocate the space in the deinit function).

The initd Parameter

All of the attributes of this structure (discussed previously) are available to the main function. There should be no need to modify any of these values in the main function.

The args Parameter

The attributes from this structure were discussed previously. In the main function, however, the args array contains the actual arguments passed from each row to the function. Because these can differ in type, you must case them to the appropriate type. For an argument of type INT_ RESULT, cast args->args[i] to a long long, as follows:

long long int_val; int_val = *((long long*) args->args[i]);

For an argument of type REAL_RESULT, cast to a double, as follows:

double   real_val; real_val = *((double*) args->args[i]);

For an argument of type STRING_RESULT, the string is available as args->args[i], and the length of the string as args->length[i], excluding any trailing nulls (for numeric types, args->length[i] still contains the maximum length it was given in the init function).

The length Parameter

This is a pointer to an integer that you set to the length of the returned value (excluding trailing nulls).

The is null Parameter

Set this to 1 if the UDF returns a null value; otherwise, leave it at the default, 0.

The result Parameter

This is a pointer to a character array and is where you place the return value of the UDF; 255 bytes are allocated, so if the result is longer, you'll need to use the ptr parameter from the init function. You'll need to allocate and deallocate this memory.

The Deinit Function

This function frees memory allocated by the init function and takes care of any other required cleanup, specifically the pointer that may have been allocated in the init function. You declare the function as follows:

void function_name_deinit(UDF_INIT *initid)

Creating a Sample Standard UDF

After all the introductions, let's create a small UDF called count_vowels. It takes one argument only (which must be a string) and returns the number of vowels in the string. Listing 6.1 contains the UDF.

Listing 6.1: count_vowels.cc

#ifdef STANDARD #include <stdio.h> #include <string.h> #else #include <my_global.h> #include <my_sys.h> #endif #include <mysql.h> #include <m_ctype.h> #include <m_string.h> /* These must be right or mysqld will not find the symbol! */ extern "C" { my_bool count_vowels_init(UDF_INIT *initid, UDF_ARGS *args, char *message); void count_vowels_deinit(UDF_INIT *initid); long long count_vowels(UDF_INIT *initid, UDF_ARGS *args,  char *is_null, char *error); } /* Makes sure there is one argument passed, and that it's a string. */ my_bool count_vowels_init(UDF_INIT *initid, UDF_ARGS *args, char *message) {   if (args->arg_count != 1 || args->arg_type[0] != STRING_RESULT) {     strcpy(message,"You can only pass one argument, and it must be a string");     return 1;   }   return 0; } /* no need for a deinit function, as we don't allocate extra memory */ void count_vowels_deinit(UDF_INIT *initid){ } /* count the number of vowels in the string */ long long count_vowels(UDF_INIT *initid, UDF_ARGS *args,char *is_null,–  char *error) {  long long num_vowels = 0; /* the same type as the result of the function */  char *word = args->args[0];  /* pointer to string */  int i = 0; /* to loop through the word */  char c;    /* to contain the letter  */  while ( ( c = word[ i++ ] ) != '\0' ) {   switch ( c ) {    case 'a':    case 'e':    case 'i':    case 'o':    case 'u':    num_vowels++; /* if the letter in c is a vowel, increment the counter */   }  }  return num_vowels; }

Once you've saved the file, make and compile it, and then copy it to the directory where you place your libraries, as discussed earlier:

% make count_vowels.o g++ -DMYSQL_SERVER -DDEFAULT_MYSQL_HOME="\"/usr/local/mysql\""  -DDATADIR="\"/usr/local/mysql/var\""  -DSHAREDIR="\"/usr/local/mysql/share/mysql\""  -DHAVE_CONFIG_H -I../innobase/include -I./../include  -I./../regex -I. -I../include -I. -O3 -DDBUG_OFF  -fno-implicit-templates -fno-exceptions -fno-rtti -c repeat_str.cc % gcc -shared -o count_vowels.so count_vowels.cc -I../innobase/include–  -I./../include -I./../regex -I. -I../include -I.

Now connect to MySQL, load the function, and run a test:

mysql> CREATE FUNCTION count_vowels RETURNS INTEGER SONAME "count_vowels.so"; Query OK, 0 rows affected (0.02 sec) mysql> SELECT id,word,count_vowels(word) FROM words; +------+--------+--------------------+ | id   | word   | count_vowels(word) | +------+--------+--------------------+ |    1 | aeiou  |                  5 | |    2 | bro    |                  1 | |    3 | so     |                  1 | |    4 | kisso  |                  2 | |    5 | lassoo |                  3 | |    2 | bro    |                  1 | |    3 | so     |                  1 | |    4 | kisso  |                  2 | |    4 | kisso  |                  2 | |    5 | lassoo |                  3 | +------+--------+--------------------+ 10 rows in set (0.00 sec)

If you passed a nonstring argument such as from the id field, or more than one argument, you'd get the error message you specified:

mysql> SELECT id,word,count_vowels(id) FROM words; ERROR: You can only pass one argument, and it must be a string

Warning

If you make a change to the UDF, be sure to DROP the function first from MySQL before you upload it again. You're quite likely to crash MySQL and have to restart if you don't!

Understanding Aggregate Functions

Aggregate functions are those that can be used with a GROUP BY clause, such as SUM() and AVG(). To create an aggregate UDF, you use the same functions as with a standard UDF, except that there are two more required: the reset and add functions. The behavior of the other functions is different as well:

The Reset Function This function is called at the beginning of each new group. Data used for group calculations are reset here. You declare the function as follows:
char *xxx_reset(UDF_INIT *initid, UDF_ARGS *args, 
                char *is_null, char *error); 
The Add Function  This is called for each row of the group except the first row. You'll probably want it to be called for every row, in which case you'll need to call it from within the reset function.

The Main Function  The main function is only called once per group of data (at the end), so it performs any necessary calculations on the entire group of data (usually accessed by initd->ptr).

The Init Function  Behaves the same as with a standard UDF, except the ptr attribute becomes much more important in an aggregate function. It stores data about each group, which is added within the add function. The main function can then access it to get data about the entire group.

The Deinit Function  Plays the same role as with a standard UDF, except that it will almost always exist, as you need to clean up ptr.

Creating a Sample Aggregate UDF

For the aggregate UDF, you're going to make some changes to the count_vowels UDF so that it counts groups of vowels. You're going to create a structure called data with an element count. You'll increment data->count each time you encounter a vowel in the add function (which is called every row), and reset the value in the reset function (called once per group). Because the add function is not explicitly called for the first row, you'll call it from the reset function to make sure the first row of the group is also counted. Listing 6.2 contains the aggregate UDF.

Listing 6.2: count_agg_vowels.cc

#ifdef STANDARD #include <stdio.h> #include <string.h> #else #include <my_global.h> #include <my_sys.h> #endif #include <mysql.h> #include <m_ctype.h> #include <m_string.h>           // To get strmov() #ifdef HAVE_DLOPEN /* These must be right or mysqld will not find the symbol! */ extern "C" { my_bool count_agg_vowels_init( UDF_INIT* initid, UDF_ARGS* args, char* message ); void count_agg_vowels_deinit( UDF_INIT* initid ); void count_agg_vowels_reset( UDF_INIT* initid, UDF_ARGS* args,– char* is_null, char *error ); void count_agg_vowels_add( UDF_INIT* initid, UDF_ARGS* args,– char* is_null, char *error ); long long count_agg_vowels( UDF_INIT* initid, UDF_ARGS* args,– char* is_null, char *error ); } struct count_agg_vowels_data {   unsigned long long count; }; /* Count the number of vowels */ my_bool count_agg_vowels_init( UDF_INIT* initid, UDF_ARGS* args, char* message ) {   struct count_agg_vowels_data* data;   if (args->arg_count != 1 || args->arg_type[0] != STRING_RESULT) {     strcpy(message,"You can only pass one argument, and it must be a string");     return 1;   }   initid->max_length = 20;   data = new struct count_agg_vowels_data;   data->count = 0;   initid->ptr = (char*)data;   return 0; } /* free the memory allocated to from ptr */ void count_agg_vowels_deinit( UDF_INIT* initid ) {   delete initid->ptr; } /* called once at the beginning of each group. Needs to call the add function as well resets data->count to 0 for the new group */  void count_agg_vowels_reset( UDF_INIT* initid, UDF_ARGS* args,– char* is_null, char* message ) {   struct count_agg_vowels_data* data = (struct count_agg_vowels_data*)initid->ptr;   data->count           = 0;   *is_null = 0;   count_agg_vowels_add( initid, args, is_null, message ); } /* called for every row, add the number of vowels to data->count */ void count_agg_vowels_add( UDF_INIT* initid, UDF_ARGS* args, char* is_null,– char* message ) {   struct count_agg_vowels_data* data    =– (struct count_agg_vowels_data*)initid->ptr;   char *word = args->args[0];  /* pointer to string */   int I = 0;   char c;   while ( ( c = word[ I++ ] ) != '\0' ) {    switch ( c ) {     case 'a':     case 'e':     case 'i':     case 'o':     case 'u':     data->count++;    }  } } /* returns data->count, or a null if it cannot find anything */ long long count_agg_vowels( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char* error ) {   struct count_agg_vowels_data* data = (struct count_agg_vowels_data*)initid->ptr;   if (!data->count)   {     *is_null = 1;     return 0;   }

  *is_null = 0;   return data->count; } #endif /* HAVE_DLOPEN */ % make count_agg_vowels.o g++ -DMYSQL_SERVER -DDEFAULT_MYSQL_HOME="\"/usr/local/mysql\""  -DDATADIR="\"/usr/local/mysql/var\""  -DSHAREDIR="\"/usr/local/mysql/share/mysql\""  -DHAVE_CONFIG_H -I../innobase/include -I./../include  -I./../regex -I. -I../include -I. -O3 -DDBUG_OFF  -fno-implicit-templates -fno-exceptions -fno-rtti -c repeat_str.cc % gcc -shared -o count_agg_vowels.so count_agg_vowels.cc -I../innobase/include–  -I./../include -I./../regex -I. -I../include -I.

Now connect to MySQL, load the function, and run a test:

mysql> CREATE AGGREGATE FUNCTION count_agg_vowels RETURNS INTEGER  SONAME "count_vowels.so"; Query OK, 0 rows affected (0.02 sec) mysql> SELECT count_agg_vowels(word) FROM words; +------------------------+ | count_agg_vowels(word) | +------------------------+ |                     21 | +------------------------+ 1 row in set (0.01 sec) mysql> SELECT id,count_agg_vowels(word) FROM words GROUP BY id; +------+------------------------+ | id   | count_agg_vowels(word) | +------+------------------------+ |    1 |                      5 | |    2 |                      2 | |    3 |                      2 | |    4 |                      6 | |    5 |                      6 | +------+------------------------+ 5 rows in set (0.00 sec)