With the object extensions of DB2, you can incorporate object-oriented concepts and methodologies into your relational database by extending DB2 with richer sets of data types and functions. With those extensions, you can store instances of object-oriented data types in columns of tables and operate on them using functions in SQL statements. In addition, you can control the types of operations that users can perform on those data types. The object extensions that DB2 provides are large objects (LOBs), distinct types, and UDFs.
Schemas are qualifiers used to qualify user-defined distinct types, UDFs, stored procedures, triggers, sequences, and jars. All the objects qualified by the same schema name can be thought of as a group of related objects. A schema name has a maximum length of 8 bytes. The schema name SYSIBM is used for built-in data types and built-in functions, and SYSPROC is used for some stored procedures delivered by IBM in support of the Control Center and Visual Explain. Stored procedures that were created prior to DB2 version 6 may also use the SYSPROC schema name.
The schema name can be specified explicitly when the object is referenced in the CREATE, ALTER, DROP, or COMMENT ON statement. If the object is unqualified and the statement is dynamically prepared, the SQL authorization ID contained in the CURRENT SQLID special register is used for the schema name of the objects.
Certain authorities are associated with schemas. Schema privileges include CREATIN, ALTERIN, and DROPIN. This allows you to create, alter, or drop objects in the identified schema. If the schema name is the same as an authorization ID, it has those privileges implicitly.
The PATH bind option is applicable to BIND PLAN, BIND PACKAGE, REBIND PLAN, and REBIND PACKAGE. The list of schemas specified is used to resolve unqualified references to user-defined distinct types and UDFs in static SQL statements. This list is also used to resolve unqualified stored procedure names when the SQL CALL statement specifies a literal for the procedure name. The list specifies an ordered list of schemas to be searched to resolve these unqualified references.
The PATH has a corresponding special register. The SET CURRENT PATH statement changes the value of the PATH special register. This PATH special register is used in the same way as the PATH bind optionto resolve unqualified references in dynamic SQLand can also be used to resolve unqualified stored-procedure names in CALL host-variable statements. The PATH bind option is used to set the initial value for this special register. SYSIBM and SYSPROC do not need to be specified as part of the PATH; they are implicitly assumed as the first schema.
A distinct type is the DB2 implementation of a user-defined data type. By using distinct types, you can avoid some excess code in order to support data typing that is not included in the DB2 product. You enable DB2 to do strong-typing: Only functions and operations defined on the distinct type can be applied to instances of the distinct type. This is beneficial for applications so that you do not have to code for comparison errors.
Once the distinct type is defined, column definitions can reference that type during the issuing of the CREATE or ALTER statements, just as they would any DB2 built-in data type. If a distinct type is specified without a schema name, the distinct type is resolved by searching the schemas in the current path.
Distinct types allow you to use DB2 built-in data types in special ways. Distinct types are built off of the DB2 built-in types. Distinct types allow you to extend these types and declare specialized usage on them. DB2 then enforces these rules by performing only the kinds of computations and comparisons that you have defined for the data type.
In order to use a distinct type, you must first create it. Distinct types are created by using one of the DB2 built-in types as a base. You create them by using the CREATE DISTINCT TYPE statement:
CREATE DISTINCT TYPE distinct-type-name AS source-data-type WITH COMPARISONS
The name of the distinct type is a two-part name, which must be unique within the DB2 subsystem. The qualifier is a schema name. The distinct type shares a common internal representation with its source data type. However, the distinct type is considered to be an independent data type that is different from the others.
LONG VARHCAR and LONG VARGRAPHIC cannot be used as source types.
An instance of a distinct type can be compared only with another instance of the same distinct type. For compatibility with DB2 on other platforms, you can also specify the WITH COMPARISONS clause at the end of the CREATE statement. DB2 z/OS allows comparisons only between items of the same distinct type. This clause is not allowed if the source data type is not a LOB type, such as BLOB, CLOB, or DBCLOB. Comparisons for these types are not allowed.
Operations on Distinct Types
Two operations are allowed on distinct types: comparisons and casting. You can compare the values of distinct types (non-LOB), or you can cast between the distinct type and the source type.
Character and arithmetic operators that are used in built-in functions that are used on a source type are not automatically inherited by the distinct type. These operators and functions need to be created explicitly.
The following comparison operators are allowed on distinct types.
Note that LIKE and NOT LIKE are not supported.
Casting functions are used to convert instances of source data types into instances of a different data types. These functions have the name of the target data type and will have a single parameter, which is the source data type. They will return the target data type. Two cast functions are generated by DB2 when the CREATE DISTINCT TYPE is issued. These functions, used to convert between the distinct type and its source type, will be created in the same schema as the distinct type. The following shows an example of creating a distinct type and then using it both with and without casting:
CREATE DISTINCT TYPE EURO AS DECIMAL (9,2) WITH COMPARISONS EURO(DECIMAL) -- where EURO is the target and DECIMAL is the source DECIMAL (EURO) -- where DECIMAL is the target and EURO is the source Without casting using the function name SELECT ITEM FROM INVENTORY WHERE COST > EURO (1000.00) With casting using cast function SELECT ITEM FROM INVENTORY WHERE COST > CAST (1000.00 AS EURO)
Constants are always considered to be source-type values.
If you want to find all items that have a cost greater than 1,000.00 euros, you will have to cast, because you cannot compare the data of type EURO with data of the source data type of the EURO, which is DECIMAL. You will need to use the cast function to cast data from DECIMAL to EURO. You can also use the cast function DECIMAL to cast from EURO to DECIMAL and cast the column COST to type DECIMAL. Depending on the way you choose to castfrom or to the distinct typeyou can use the function-name notation data-type(argument) or the cast notation CAST(argument AS data-type).
The built-in data types come with a collection of built-in functions that operate on them. Some of these functions implement operators, such as the arithmetic operators on numeric data types and substring operators on character data types. Other functions, such as scalar and column functions, are discussed in Chapter 6.
When you create a distinct type, you can also set it up to inherit some or all of the functions that operate on the corresponding source type. This is done by creating new functions, called sourced functions, that operate on the distinct type and duplicate the semantics of the built-in functions that work on the source type. Built-in functions without sourced functions for a distinct type will not be available for use against that distinct type. The following example shows how to create a sourced function:
CREATE FUNCTION '+' (EURO, EURO) RETURNS EURO SOURCE SYSIBM.'+' (DECIMAL(9,2), DECIMAL(9,2))
You can also give distinct types distinct semantics of their own by creating external functions that you write in a host language that will operate on your distinct types.
You need to have privileges granted in order to use distinct types. The GRANT USAGE ON DISTINCT TYPE is used in order to grant privileges to use the distinct type as a column data type in a CREATE or ALTER statement or to use the distinct type as a parameter in a stored procedure or UDF. The GRANT EXECUTE ON enables users to cast functions on a distinct type.
Information about the distinct types is stored in the DB2 catalog. The following tables contain information about distinct types:
UDFs form the basis of object-relational extensions to SQL, along with distinct types and LOBs. Fundamentally, a database function is a relationship between a set of input data values and a result value. DB2 Universal Database comes with many built-in functions; however, it is possible to create your own column, scalar, and table functions.
There are three types of functions in DB2: column, scalar, and table. Column, or aggregate, functions can only be created as sourced functions as described in the section on distinct types. Scalar functions receive parameters and return a single result value. Table functions return an entire table, making them completely different from our common concept of a "function."
In DB2, you can create your own scalar or table functions. A UDF can be written in a high-level programming language such as COBOL, C, C++, or Java, or you can use a single SQL statement.
The built-in column and scalar functions are covered in Chapter 5.
Types of UDFs
UDFs are of four types:
An external UDF is similar to any other program written for the z/OS platform. External UDFs may contain SQL statements, IFI (Instrumentation Facility Interface), or DB2 commands and may be written in assembler, COBOL, C, C++, PL/I, or Java. External UDFs, once written and generated as dynamically loadable libraries or classes, must be registered with the database. An external function is defined to DB2 with a reference to an z/OS load module that DB2 should load when the function is invoked. The z/OS load module contains the object code for the application program that contains the logic of the external function. If the program contains SQL statements, an associated package contains the access path for the embedded SQL statement. External functions cannot be column functions. The following congrat function is an external scalar UDF and is registered using the CREATE FUNCTION statement:
CREATE FUNCTION congrat(VARCHAR(30),VARCHAR(40)) RETURNS VARCHAR(30) EXTERNAL NAME 'CONGRAT' LANGUAGE C PARAMETER STYLE DB2SQL DETERMINISTIC FENCED READS SQL DATA COLLID TEST WLM ENVIRONMENT WLMENV1 NO EXTERNAL ACTION DISALLOW PARALLEL;
DB2 passes parameters to external UDFs in a standard manner, as parameters are passed to stored procedures. DB2 uses the following structure:
A scalar function can return only a single result parameter, whereas table functions return multiple result parameters, each representing a column in a row of the table being returned.
The SQLSTATE can be returned from the external UDF to DB2 to indicate a condition that DB2 can then act on. It is highly recommended that the UDF return a SQLSTATE to the caller.
If the UDF returns a SQLSTATE that is not allowed, DB2 replaces the SQLSTATE with 39001 and returns a SQLCODE of 463. The IBM DB2 Messages and Codes manual documents the valid SQLSTATE values.
Sourced UDFs are registered simply by specifying the DB2 built-in source function. Sourced functions can be scalar functions or column functions but cannot be table functions. Sourced functions are often helpful when use of a built-in function on a distinct type is needed.
This example allows you to create an AVG function for the SCORE data type:
CREATE FUNCTION AVG (SCORE) RETURNS SCORE SOURCE SYSIBM.AVG(DECIMAL);
These CREATE FUNCTION statements place an entry for each UDF in the SYSIBM.SYSROUTINES catalog table, and the parameters are recorded in SYSIBM.SYSPARMS. These catalog tables can be queried for information about the UDFs.
SQL Scalar UDFs
An SQL scalar function is a UDF whose entire functionality is a single SQL expression and is coded into the CREATE FUNCTION statement. The function is identified as an SQL scalar function by coding the LANGUAGE SQL option of the CREATE FUNCTION statement. This enables you to code an expression used commonly within more than one statement and to modularize that expression by storing it separately as a UDF. Any SQL statement can then reference the UDF in the same manner in which any scalar function can be invoked. This enables common expressions to be coded only once and stored separately in the DB2 catalog, centralizing the coding and administration of these types of functions.
The SQL expression is specified in the RETURN clause of the CREATE FUNCTION statement and can contain references to the function input parameters, as in the following example, which computes the total number of months between two dates:
CREATE FUNCTION TOTMON (STARTX DATE, ENDY DATE) RETURNS INTEGER LANGUAGE SQL CONTAINS SQL NO EXTERNAL ACTION DETERMINISTIC RETURN ABS( (YEAR(STARTX - ENDY)*12) + MONTH(STARTX - ENDY) );
The expressions contained in the SQL scalar UDF cannot contain references to columns names or host variables. However, an SQL scalar UDF can invoke other UDFs, which may be external UDFs that can be an SQL program.
The source code for an SQL scalar function is stored in the SYSIBM.SYSVIEWS DB2 catalog table. When an SQL statement referencing an SQL scalar function is compiled, the function source from the SYSIBM.SYSVIEWS catalog table is merged into the statement. Package and plan dependencies on the SQL scalar functions, as with all UDFs, are maintained in the SYSIBM.SYSPACKDEP and SYSIBM.SYSPLANDEP tables, respectively.
With DB2, you can also create another type of UDF: a table function, which returns a table to the SQL statement that calls it. This means that a table function can be referenced only in the FROM clause of a SELECT statement. The table function provides a means of including external data or complex processes in SQL queries. Table functions can read non-DB2 datafor instance, a file on the operating system or over the World Wide Webtabularize it, and return the data to DB2 as a relational table that can subsequently be treated like any other relational table. For example, the APPFORM table function in the next example takes in a candidate application form, processes it, and returns the dataexcept for the candidate ID, which is generatedin an appropriate format to be inserted in the CANDIDATE table:
CREATE FUNCTION APPFORM(VARCHAR(30)) RETURNS TABLE (LNAME VARCHAR(30), FNAME VARCHAR(30), INITIAL CHAR(1), HPHONE PHONE, WPHONE PHONE, STREETNO VARCHAR(8), STREETNAME VARCHAR(20), CITY VARCHAR(20), PROV_STATE VARCHAR(30), CODE CHAR(6), COUNTRY VARCHAR(20)) EXTERNAL NAME APPFORM LANGUAGE C WLM ENVIRONMENT WLMENV1 PARAMETER STYLE DB2SQL NO SQL DETERMINISTIC NO EXTERNAL ACTION FINAL CALL DISALLOW PARALLEL CARDINALITY 20;
If we wanted to insert into the CANDIDATE table a new candidate, based on his or her application form, we could use the following SELECT statement:
INSERT INTO CANDIDATE SELECT CID, LNAME, FNAME, INITIAL, HPHONE, WPHONE, STREETNO, STREETNAME, CITY, PROV_STATE, CODE, COUNTRY FROM TABLE(APPFORM(' \DOCS\NEWFORM.TXT')) AS AP
Scalar UDFs are invoked in much the same way as any built-in DB2 scalar function. A function name identifies the function, and one or more parameters pass information from the invoking SQL statement to the UDF. The parameters passed can be table columns, constants, or expressions. If an expression is passed to an external UDF, DB2 resolves the expression and then passes the result to the UDF. The result of the UDF execution replaces the function invocation at execution time. In the following example, the SQL scalar function TOTMON is used to calculate the number of months between two dates:
SELECT HIREDATE, BIRTHDATE, TOTMON(HIREDATE, BIRTHDATE) as total_months, FROM DSN8710.EMP;
Here, the HIREDATE and BIRTHDATE columns are selected from the EMP table, and the TOTMON function, previously defined in this chapter, determines the total number of months between the two dates, which were fed to the function as parameters. In this case, the TOTMON function, being an SQL scalar function, is merged with the statement during statement compilation as if the expression itself were coded within the SQL statement.
A UDF can be defined as deterministic or not deterministic. A deterministic function will return the same result from one invocation to the next if the input parameter values have not changed. Although DB2 does not have a mechanism to "store" the results of a deterministic function, the designation can impact the invoking query execution path relative to materialization. If a table expression has been nested within an SQL statement, a nondeterministic function may force the materialization of the inner query. For example:
SELECT WORKDEPT, SUM(TOTAL_MONTHS), AVG(TOTAL_MONTHS) FROM (SELECT WORKDEPT, TOTMON(HIREDATE, BIRTHDATE) as total_months, FROM DSN8710.EMP) AS TAB1 GROUP BY WORKDEPT;
Here, if an index is on the WORKDEPT column of the EMP table, the inner table expression called TAB1 may not be materialized but rather be merged with the outer select statement. This is possible because the TOTMON function is deterministic. If the TOTMON function were not deterministic, DB2 would have to materialize the TAB1 table expression, possibly storing it in the DSNDB07 temporary table spaces and sorting to perform the desired aggregation. However, it is not clear whether having TOTMON be deterministic is a good thing. If the TOTMON function is CPU intensive, it may be better to materialize the result of the inner table expression because the merged TOTMON function, if it is deterministic, will be executed twice in the outer query: once per referencefor the SUM and AVG functions, in this case.
A table function can be referenced in an SQL statement anywhere that a table can normally be referenced. The table function, or a nested table expression, is identified in the query by the use of the TABLE keyword, as in the following example:
SELECT TAB1.EMPNO, TAB2.TEMPURATURE, TAB2.FORECAST FROM EMP, TABLE(WEATHERFUNC(CURRENT DATE)) AS TAB2
In this query, the TABLE keyword identifies a nested table expression called TAB2 that was an invocation of the table UDF called WEATHERFUNC. The query returns the employee number and some weather information in some of the columns that are returned from the WEATHERFUNC table function. This is a fairly simple invocation of a table function.
More important, you can embed correlated references within a nested table expression. Although the weather may not be useful information to return with employee data, perhaps retrieving the resume and credit information from an external source is. In this case, you can pass the employee number as a correlated reference into the table expression identified by the TABLE keyword and ultimately pass it into the table UDF:
SELECT TAB1.EMPNO, TAB2.RESUME, TAB2.CREDITINFO FROM EMP AS TAB1, TABLE(EMPRPT(TAB1.EMPNO)) AS TAB2
The TABLE keyword tells DB2 to look to the left of the keyword when attempting to resolve any otherwise unresolvable correlated references within the table expression. If the join was coded in reversethat is, the invocation of the EMPRPT table UDF appears in the statement before the EMP tablethe correlated reference to the TAB1.EMPNO column would not have been resolved, and the statement would not have compiled successfully.
The use of the TABLE keyword can be expanded beyond that of correlated references as input into table UDFs. The same keyword can be used with a nested table expression that may benefit from a correlated reference. This can be especially useful when the nested expression is performing an aggregation and needs to work on only a subset of the data in the table it is accessing. The following example lists the employee number and salary of each employee, along with the average salary and head count of all employees in their associated departments. This is traditionally coded as a left outer join of two table expressions, the first getting the employee numbers and salaries and the second calculating the head count and average salary for all departments. If there is filtering against the employee table, the entire table might be unnecessarily read to perform the aggregations:
SELECT TAB1.EMPNO, TAB1.SALARY, TAB2.AVGSAL,TAB2.HDCOUNT FROM (SELECT EMPNO, SALARY, WORKDEPT FROM DSN8610.EMP WHERE JOB='SALESREP') AS TAB1 LEFT OUTER JOIN (SELECT AVG(SALARY) AS AVGSAL, COUNT(*)AS HDCOUNT, WORKDEPT FROM DSN8610.EMP GROUP BY WORKDEPT) AS TAB2 ON TAB1.WORKDEPT = TAB2.WORKDEPT;
Here, the entire EMP table has to be read in the TAB2 nested table expression in order to calculate the average salary and headcount for all departments. This is unfortunate, because we need only the departments that employ sales reps. We can use the TABLE keyword and a correlated reference to TAB1 within the TAB2 expression to perform filtering before the aggregation:
SELECT TAB1.EMPNO, TAB1.SALARY, TAB2.AVGSAL,TAB2.HDCOUNT FROM DSN8610.EMP TAB1 ,TABLE(SELECT AVG(SALARY) AS AVGSAL, COUNT(*) AS HDCOUNT FROM DSN8610.EMP WHERE WORKDEPT = TAB1.WORKDEPT) AS TAB2 WHERE TAB1.JOB = 'SALESREP';
Polymorphism and UDFs
DB2 UDFs subscribe to the object-oriented concept of polymorphism. Ad hoc polymorphism, better described as overloading, allows an SQL statement to issue the same function against varying parameter lists and/or data types. This overloading requires you to create a unique definition for each variation of a particular function in data types or number of parameters. Polymorphism means "many changes"; for DB2 functions, many functions can have the same name.
These functions are identified by their signature, comprising the schema name, the function name, the number of parameters, and the data types of the parameters. This enables you to create UDFs for your distinct types. These sourced UDFs can assume the same name as the UDFs of built-in functions they are sourced from but are unique in the system, owing to the data type of their parameter(s). Signatures also allow you to define SQL or external UDFs to accommodate any variation in data type or number of parameters. For example, if you need a variation of the TOTMON function that accommodates timestamps, you can create the following function:
CREATE FUNCTION TOTMON (STARTX TIMESTAMP, ENDY TIMESTAMP) RETURNS INTEGER LANGUAGE SQL CONTAINS SQL NO EXTERNAL ACTION NOT DETERMINISTIC RETURN ABS( (YEAR(STARTX - ENDY)*12) + MONTH(STARTX - ENDY) );
The only difference between this TOTMON and the original TOTMON is that the input parameters here are TIMESTAMPs instead of DATEs. From the application programming point of view, this enables an SQL statement to issue a TOTMON function, regardless of whether it is using a pair of dates or timestamps as input. However, the people responsible for deploying the UDFs must do so with consistency of functionality for like-named functions.
External UDF Execution
The external scalar and table UDF programs execute in a z/OS WLM environment, in much the same way as stored procedures. The WLM environment is supported by one or more WLM address spaces. The WLM keywords that name the WLM environment in which to execute should be specified when creating the function; otherwise, the program defaults to the WLM environment specified at installation time. This default environment can be seen in the SYSIBM.SYSROUTINES catalog table. UDFs execute under the same thread as the invoking program and will run at the same priority, using the WLM enclave processing.
Monitoring and Controlling UDFs
You can invoke UDFs in an SQL statement wherever you can use expressions or built-in functions. UDFs, like stored procedures, run in WLM-established address spaces. DB2 UDFs are controlled by the following commands.
STOP FUNCTION SPECIFIC stops an external function. Use the START FUNCTION SPECIFIC command to activate all or a specific set of stopped external functions.
To prevent DB2 from accepting SQL statements with invocations of the specified functions, issue the following statement:
STOP FUNCTION SPECIFIC (specific-function-name)
The optimizer will use statistics, if available, for estimating the costs for access paths where UDFs are used. The statistics that the optimizer needs can be updated by using the SYSSTAT.FUNCTIONS catalog view. The statistics report contains a field that allows you to view the maximum level of indirect SQL cascading, including cascading stemming from triggers, UDFs, or stored procedures (see Figure 15-3).
Figure 15-3. Accounting report information for UDF operations
User-defined table functions add access cost to the execution of an SQL statement. In order for DB2 to determine the cost factor for the use of user-defined table functions in the selection of the best access path for an SQL statement, the total cost of the user-defined table function must be determined. This cost has three components:
To determine the elapsed and CPU time spent for UDF operations, you can view an accounting report (see Figure 15-3).
The SYSIBM.SYSROUTINES catalog table describes UDFs. To retrieve information about UDFs, you can use the following query:
SELECT SCHEME, NAME, FUNCTION_TYPE, PARM_COUNT FROM SYSIBM.SYSROUTINES WHERE ROUTINETYPE='F'