Large Objects | DB2 for z/OS Version 8 DBA Certification Guide

Three DB2 data types support LOBs:

A BLOB (binary large object) has binary strings, not associated with a CCSID, and is good for storing image, voice, and sound data.
A CLOB (character large object) has strings made of single-byte characters or single-/double-byte characters with an associated CCSID and is used if data is larger than VARCHAR allows.
A DBCLOB (double-byte character large object) has strings made of double-byte characters with an associated CCSID and is good for storing documents.

Each of these data types can contain up to 2GB of data, although in most cases, the amount of storage for individual columns will be considerably less, depending on the type of data stored.

Today, the 32KB-long VARCHAR column, which has limitations in both size and functionality, is widely used. Most of this use will probably be replaced by using LOBs in the future, especially for new applications and functions.

LOBs are implemented with structures that are different from normal tables and table spaces. A LOB table space must be created for each columnor each column of each partitionof the base table. This table space contains the auxiliary table, which must have an auxiliary index associated with it. The LOB table space has a different recovery scheme, optional logging, and different locking options. For information on LOB locking, refer to Chapter 16.

If a table contains a LOB column and the plan or package is bound with SQLRULES(STD), DB2 will implicitly create the LOB table space, the auxiliary table, and the auxiliary index. DB2 will choose the name and characteristics for these implicitly created objects.

NOTE

It is better to develop naming standards beforehand for these objects and to control placement of them. This is critical for both management and performance.

Inserting and Loading LOBs

LOB loading and insertion is also different from processes for non-LOB data. The methods are also entirely different, depending on whether extenders are used. Without extenders, some limitations need to be addressed when inserting LOB data, primarily the 32KB limit and logging impacts. If the total length of the LOB column and the base table row is less than 32KB, the LOAD utility can insert the LOB column. When the limits of LOAD are exceeded, SQL INSERT or UPDATE statements need to be used. But the SQL INSERT has its own limitations in that enough memory needs to be available to hold the entire value of the LOB. The limitations are the amount of memory available to the application and the amount of memory that can be addressed by the language used. If the LOBs are all small, it is not as much of an issue, as memory and language constructs will be available.

When dealing the very large LOBs, the differences can be seen easily when comparing the C language construct with COBOL. For example, following is the C language for a LOB host variable:

 SQL TYPE IS CLOB(20000K) my_clob;

This is generated by DB2 as

 struct { unsigned long length;          Char data[20960000];        } my_clob;

The COBOL language for a LOB host variable is

 01 MY-CLOB     USAGE IS SQL TYPE IS CLOB(20000K).

This is generated by DB2 as

 01  MY-CLOB.     02  MY-CLOB-LENGTH    PIC 9(9) COMP.     02 MY-CLOB-DATA.        49  FILLER     PIC  X(32767).        49  FILLER     PIC  X(32767).        49  FILLER     PIC  X(32767). --Repeated 622 times

Extenders can assist in solving the problem. When a table and a column are enabled for an extender, the whole process changes. An INSERT statement can be used in the program with extender functions (UDFs) that allow the image to be loaded into the database directly from an external file. With the image extender, for example, the image content is inserted into an administrative support table, and another record is then inserted into another administrative table describing the attributes of the image, such as number of colors, thumbnail-sized version, and format characteristics (JPEG, TIFF, and so on). The extenders require WLM to be installed in support of the extender UDFs and stored procedures, and the use of WLM in goal mode, for performance reasons.

Even though the LOB data is stored in an auxiliary table, the INSERT statement specifies the base table. You can read the LOB data from a file in your DB2 program and place the data into the declared DB2 LOB variable. The insert statement then simply references the LOB variable. For example, if you wanted to insert employee resumes into the EMP_RESUME table, which includes a 2MB CLOB data type to hold resumes, you would declare the resume variable in your program:

 SQL TYPE IS CLOB(2000K) resume;

Then you would populate the resume variable with the CLOB data and perform the insert:

 EXEC SQL INSERT INTO EMP_RESUME VALUES (:EMPNO, :RESUME);

DB2 uses contiguous storage in data spaces to store LOBs that your program is manipulating. Because LOBs can be quite large, DB2 avoids materializing them until completely necessary. The amount of storage required depends on the size of the LOBs and the number of LOBs referenced within a statement. The amount of storage required by your program and DB2 can become quite large. For this reason, you can use LOB locators to manipulate LOB data without retrieving that data from the DB2 table. A LOB locator is declared in the application program:

 SQL TYPE IS CLOB_LOCATOR resume_loc;

An SQL statement can reference the locator, and the LOB is not materialized in DB2 until absolutely necessary, and it is never moved into the application memory:

 SELECT RESUME INTO   :resume_loc FROM   EMP_RESUME WHERE  EMPNO=:empno;

Further SQL statements can reference the locator variable, allowing the LOB to be manipulated in various ways. One way would be by using SQL SET commands. DB2 will manipulate the LOB data within the table and materialize the LOB only as needed.

LOBs and Distinct Types

Being able to store LOBs and to manipulate them through extenders is only part of the story. New distinct data types can be defined based on the needs of particular applications. A distinct type provides a way to differentiate one LOB from another LOB, even of the same base type, such as BLOB or CLOB. A distinct type is not limited to objects and can be created for standard data types, as well as LOBs.

Even though stored as binary or character LOBs, image, video, spatial, XML, and audio objects are treated as types distinct from BLOBs and CLOBs and distinct from one another. For example, suppose that an application that processes spatial data features needs a polygon data type. You can create a distinct type named polygon for polygon objects, as follows:

 CREATE DISTINCT TYPE polygon AS BLOB (512K)

The polygon-type object is treated as a distinct type of object, even though, internally, it is represented as a 512KB binary object (BLOB). Distinct types are used like SQL built-in types to describe the data stored in columns of tables. The extenders create distinct data types for the type of object they process, such as image, audio, and video, which makes it easier for applications to incorporate these types of objects:

 CREATE TABLE DB2_MAG_DEMO    (GEO_ID CHAR(6),     EURO_ANNUAL    EURO_DOLLAR,     OZ_ANNUAL      AUSTRALIAN_DOLLAR,     US_ANNUAL      US_DOLLAR,     DEMO_GRAPHIC   POLYGON)

Casting functions allow operations between different data typesfor example, comparing a slide from a video distinct type to an expression. You must cast the expression to a distinct typevideo, in this examplein order for the comparison to work. Some casting functions are supplied with DB2CHAR, INTEGER, and so onand others are created automatically whenever a distinct type is created with the CREATE DISTINCT TYPE statement.

LOBs and Extenders

Extenders for DB2 help in the use of LOBs, the base storage for the object-relational environment. Extenders are complete packages that define distinct data types and special functions for many types of LOBs, including image, audio, video, text, XML, and spatial objects. Thus, you do not have to worry about defining these data types and functions in applications. You can use SQL and functions to manipulate these data types. LOBs can range in size from relatively small to extremely large and can be cumbersome to deal with.

The DB2 extenders installation package includes a Software Developers Kit (SDK) and a client and server runtime environment. DB2 extender applications can be executed in a server machine that has the extender client runtime codeautomatically installed when the server runtime code is installedand server runtime code. Extender applications can also be run on a client machine with the client runtime code, and you must ensure that a connection can be made to the server.

When storing image, audio, video, or text objects, you do not store the object in the user table but instead use extender-created character strings, referred to as handles, that represent the objects; the handle will be stored in the user tables. The object is stored in an administrative support table, or file identifier if the content is a file. The attributes and handles are also stored in these administrative tables. The extender then links the handle in the user table to the object and its attributes in the administrative tables.

When an extender for a particular LOB type is used, additional options allow the data to be stored in its native format in separate files, such as a picture that is a single JPEG file. In this example, the hierarchical path name would be stored in support tables that would allow the extender to use this indirect reference to process the data. The extenders also require administrative support tables that vary based on the extender used. The content of these metadata tables enables the extenders to appropriately handle user requests, such as inserting audio, displaying images, and so on. These tables identify base tables and columns that are enabled for the extender and reference other support tables used to hold attribute information about LOB columns. Triggers supplied by the extenders are used to update many of these support tables when underlying LOB data is inserted, updated, or deleted. Six extenders are available in the DB2 family: image, audio, video, text, XML, and spatial. Many others are planned, and vendors also supply extender packages.

Applications generally use SQL to retrieve pointers to the data, and UDFs are used to assist with more complex and unique operations. Extender APIs will be more commonly used, as all the coding is supplied for dealing with the LOBs. The extender for image data comes with 18 UDFs; audio has 27 UDFs, video has 18 UDFs, and the QBIC (Query by Image Content) API has 24 UDFs.

For applications, the advantages of "not-having-to-program-it" and easing the pain of the learning curve are considerable. For example, several formats are supported for dealing with the image extenders. The common onesBMP, EPS, GIF, JPG, TIFare provided, of course, along with more than 15 others. Thus, each LOB picture could be of a different format, but the program would not have to be aware of this because it would be taken care of by the extender. The same is true of the text extender. A user could browse through a series of text documents, one in Microsoft Word format, another in WordPerfect, and so on.

The greatest power of the LOB extenders is their searching ability. For example, with the text extender, searching can be done by soundex, synonym, thesaurus, proximity, linguistic, and several other criteria. With images, the QBIC API is supplied to allow searching by image content, and this is a very extensive and powerful API.

Application programming for objects generally requires the use of the extenders. Without their use, little can be done without extensive user programming. The power of objects comes with the UDFs and API libraries that are packaged with the extenders, allowing an application to easily store, access, and manipulate any of the supported object types. Although only six extenders are available now, many others are in development and will be released in the future as they are completed.

The application programmer will be able to use UDFs in the SQL to position to the necessary LOB and then use an API to manipulate it in order to display a picture on the screen, for example. Thus, two completely different libraries can be strategized from. As a simple example, without forcing it to match any particular programming language, the following represents first storing a picture in a LOB and then displaying it on the screen. First, insert the data into the LOB by using the DB2IMAGE extender:

 EXEC SQL BEGIN DECLARE SECTION;    storage_type; EXEC SQL END DECLARE SECTION; SET storage_type = MMDB_STORAGE_TYPE_INTERNAL EXEC SQL INSERT INTO MY_PERSONAL_DIGITAL_PICTURES    VALUES ('OZ TRIP 2',          'Sydney Opera House',          DB2IMAGE (               CURRENT SERVER,               'c:/My Pictures/1999/Australia/OpraHse.jpg',               'ASIS',               :storage_type));

Second, retrieve and display the data on the screen, using API DBiBROWSE:

 EXEC SQL BEGIN DECLARE SECTION;    image_handler; EXEC SQL END DECLARE SECTION; EXEC SQL SELECT PICTURE INTO :image_handler    WHERE NAME = 'Sydney Opera House'; Set return_code to DBiBROWSE("ib %s,    MMDB_PLAY_HANDLE,    image_handler,    MMDB_PLAY_BI_WAIT);

From the pseudocode, it is easy to see that the extenders offer significant power and enable applications to be written quickly. In addition, most of the work is going on at the server, and the client is simply the recipient of all that power. When implementing extenders, keep in mind that the program will need to have enough memory available to support the use of LOBs on GUI clients.

Text Extenders

Text extenders bring full-text retrieval to SQL queries for searching large text documents intelligently. With the use of text extenders, you can search several thousand large text documents very quickly. You can also search on the basis of word variations and synonyms. These documents can be stored directly in the database or in a separate file.

Text extenders allow applications to

Search documents of several languages and formats
Perform wildcard searches using masks
Perform a search for words that sound like the search input
Perform fuzzy searches for like words (various spellings)
Perform searches for specific text, synonyms, phrases, and proximity
Perform free-text searches with natural-language input

Native word-processing documents can be searched by keywords, wildcards, phrases, and proximity. IBM has built into these text extenders a high-performance linguistic search technology, giving you multiple options for searching and retrieving documents. These text searches can be integrated with your normal SQL queries, enabling you to integrate into your SELECT statements the ability to perform attribute and full-text searches very easily.

The following example shows how to perform this integration. This example performs a SELECT from a table that also performs a search on a specified document, using a text extender called DB2TX.CONTAINS. A legal cases document is being searched to see whether the words malpractice and won appear in the same paragraph for cases occurring after 1990-01-01. LEGCSE_HANDLE refers to the column LEGCSE that contains the text document:

 SELECT DOC_NUM, DOC_DATE FROM LEGAL_CLAIMS WHERE DOC_DATE > '1990-01-01' AND DB2TX.CONTAINS     (LEGCSE_HANDLE,    "malpractice"    IN SAME PARAGRAPH AS "won") = 1

Indexing Text Extenders

Scans are just as undesirable in text documents as they are within DB2 tables. Indexes need to be created so that sequential scans of documents are not necessary. By using a text index, you can speed up the searches performed on these documents.

A text index contains important words and a list of words known as stop words, such as and and the, which will not be in a text index. This list can be modified, but you would want to do it only once, at installation time. When a request is made, the index is searched for the important terms to determine which documents contain those specified terms.

To set up a text index, you first record the text documents that need to be indexed in a log table. This process occurs when a DB2 trigger is fired off during an insert, update, or delete of a column of a text document. Then, when the terms are inserted or updated in the text document, they are added to the index. They are also deleted from the index if they are deleted from the text document.

Text indexes are of four types, and the type must be established before you implement columns that will be using text extenders. Not all search options are available by all index types, so you want to make sure that the index will suit your criteria for searching.

Linguistic indexes perform linguistic processing during the analysis for the text when creating an index. Before being inserted into the index, a word is reduced to its base form. Queries also use linguistic processing when searching against this index. This index requires the least amount of space, but searches may be longer than those done against a precise index.
Precise indexes use search terms that are exactly as they are in the text document and are case-sensitive. The same processing is used for the query search terms, so they must match exactly. The search can be broadened by using masks. This index provides a more precise search, and the retrieval and indexing are fast, but more space is required for its storage.
Dual indexes are combinations of linguistic and precise indexes. This allows the user to decide which type of search to use. This type of index requires the most amount of disk space. It is slower for searching and indexing than the linguistic indexes and is not recommended for a large number of text documents.
Ngram indexes are used primarily for indexing DBCS documents; they analyze text by parsing sets of characters. This index type also supports fuzzy searches.

When creating tables that will support the ability to search text using extenders, you must consider a few design options. You can create one text index for all text columns in the table, or you can have several different text indexes, one for each text column. Using separate indexes for each text column offers the most flexibility in terms of support for your searches. It also gives you other options, such as how frequently the index is updated and where it is stored. One common index is easier to maintain but is less flexible. If your indexes are large, consider storing them on separate disks, especially if you expect to have concurrent access to the indexes.

You can also have multiple indexes on a single text column. You may want to do this if you need to allow different types of searches on a text column. And just like other DB2 indexes, these indexes will need to be reorganized. If you have a text column that is continually updated, you will need to reorganize it. However, when using these indexes, the text extender automatically reorganizes them in the background. Despite this feature, you still may have to reorganize an index manually every so often, depending on its volatility. This is done with the REORGANIZE INDEX command. Issue the GET INDEX STATUS command to see whether an index needs reorganization.

Frequency of Index Updates

When text documents are added, deleted, or changed, their content must be synchronized with the index. This information is automatically stored by triggers in a log table, and the documents will be indexed the next time an index update is executed.

The indexes can be immediately updated via the UPDATE INDEX command, but it is easier to have this performed automatically on a periodic basis. This time-based information is kept in an environment variable called DB2TXUPDATEFREQ, which provides default settings that can be changed with the ENABLE TEXT COLUMN or ENABLE TEXT TABLE command. For an existing index, you can use the CHANGE INDEX SETTINGS command to change the variable settings.

The variable for determining when indexing should occur is based on the minimum number of queued text documents in the log table; when this minimum is reached, the index is updated. Because updating indexes is a very resource-intensive and time-consuming task, this frequency should be set carefully.

Catalog View for Text Extenders

A catalog view is created for each subsystem when you run the ENABLE SERVER. This view, DB2TX.TEXTINDEXES, has information about the tables and the columns that have been enabled for the text extender. The entries are made during table, column, or external file enablement. If they are disabled, the row is removed. You can view the entries in the catalog view via SQL. In this view, you can see such information as how often the indexes are scheduled for updates, whether you have a multiple-index table, and the type of index.

Image, Audio, and Video Extenders

The DB2 video extender can store as many as three representative frames per shot. Displaying the frames gives you a quick yet effective view of a video's content. The DB2 video extender provides sample programs that demonstrate how to build and display a video storyboard.

Video storyboards allow you to preview videos before you download and view them, which can save you time and reduce video traffic on the network. When image data is placed into a table using the DB2IMAGE UDFs, many processes are performed for the application automatically. The following code demonstrates using this function:

 EXEC SQL INSERT INTO CONSULTANTS VALUES(     :cons_id,     :cons_name,     DB2IMAGE(         CURRENT SERVER,          '/RYC/images/current.bmp'          'ASIS',         MMDB_STORAGE_TYPE_INTERNAL,         :cons_picture_tag);

In this particular example, the DB2IMAGE reads all the attributes about the imageheight, width, colors, layers, pixels, and morefrom the source image file header: in this case, the current.bmp. All the input is of a standard supported format, and all graphic files contain header information about the structure of the content. The function then creates a unique handle for the image and records all the information in the support table for administrative control of the image. This table contains

The handle for the image
A timestamp
The image size, in bytes
The comment contained in :cons_picture_tag
The content of the image

The content of the image source file is inserted into the LOB table as a BLOB. No conversion is done, and the image is stored in its native format. A record in the administrative table contains all the image-specific attributes, such as the number of colors in the image, as well as a thumbnail-sized version of the image.

This example uses the storage type constants; MMDB_STORAGE_TYPE_INTERNAL was used to store the image into a database table as a BLOB. By using the extenders, we could have stored it elsewhere. If you want to store the object and have its content remain in the original file on the server, you can specify the constant MMDB_STORAGE_TYPE_EXTERNAL. Your use of LOBs does not mean that they have to be in DB2-managed tables. The administrative support table for image extenders tells where the LOB is stored. This does require UNIX System Services support on z/OS. A performance perspective requires many considerations as to where the LOB is stored, how it is used, where it is materialized, and so on.

XML Extender

XML has been added to the list of available extenders. For the next generation of business-to-business (B2B) e-commerce solutions, XML is the standard for data interchange. With the XML extender for DB2, you will be able to leverage your critical business information in DB2 databases in order to engage in B2B solutions using XML-based interchange formats.

In terms of Web publishing, you can use XML documents stored in DB2 in a single column or as a collection of data items in multiple columns and tables. The text extender in DB2 supports structured documents, such as XML. The powerful search functions provided can now be applied to a section or a list of sections within a set of XML documents, significantly improving the effectiveness of the search. Additionally, specific XML elements or attributes can be automatically extracted into traditional SQL data types to leverage DB2's sophisticated indexing and SQL query capabilities. The DB2 XML extender also supplies a visual administration tool for easy definitions for mapping elements and attributes from an XML document into columns and tables.