Chapter 11 - Character Functions

oracle pl/sql programming, 2nd edition

Chapter 11
 

11. Character Functions

Contents:
Character Function Descriptions
Character Function Examples

A character function is a function that takes one or more character values as parameters and returns either a character value or a number value. The Oracle Server and PL/SQL provide a number of different character datatypes, including CHAR, VARCHAR, VARCHAR2, LONG, RAW, and LONG RAW. In PL/SQL, the three different datatype families for character data are:

VARCHAR2

A variable-length character datatype whose data is converted by the RDBMS

CHAR

The fixed-length datatype

RAW

A variable-length datatype whose data is not converted by the RDBMS, but instead is left in "raw" form

When a character function returns a character value, that value is always of type VARCHAR2 (variable length), with the following two exceptions: UPPER and LOWER. These functions convert to upper- and lowercase, respectively, and return CHAR values (fixed length) if the strings they are called on to convert are fixed-length CHAR arguments.

PL/SQL provides a rich set of character functions that allow you to get information about strings and modify the contents of those strings in very high-level, powerful ways. Table 11.1 shows the character functions covered in detail in this chapter. The remaining functions (not covered in this chapter) are specific to National Language Support and Trusted Oracle.


Table 11.1: The Built-In Character Functions

Name

Description

ASCII

Returns the ASCII code of a character.

CHR

Returns the character associated with the specified collating code.

CONCAT

Concatenates two strings into one.

INITCAP

Sets the first letter of each word to uppercase. All other letters are set to lowercase.

INSTR

Returns the location in a string of the specified substring.

LENGTH

Returns the length of a string.

LOWER

Converts all letters to lowercase.

LPAD

Pads a string on the left with the specified characters.

LTRIM

Trims the left side of a string of all specified characters.

REPLACE

Replaces a character sequence in a string with a different set of characters.

RPAD

Pads a string on the right with the specified characters.

RTRIM

Trims the right side of a string of all specified characters.

SOUNDEX

Returns the "soundex" of a string.

SUBSTR

Returns the specified portion of a string.

TRANSLATE

Translates single characters in a string to different characters.

UPPER

Converts all letters in the string to uppercase.

11.1 Character Function Descriptions

The following sections briefly describe each of the PL/SQL character functions.

11.1.1 The ASCII function

The ASCII function returns the NUMBER code that represents the specified character in the database character set. The specification of the ASCII function is:

FUNCTION ASCII (single_character IN VARCHAR2) RETURN NUMBER

where single_character is the character to be located in the collating sequence. Even though the function is named ASCII, it will return the code location in whatever the database character set is set to, such as EBCDIC Code Page 500 or 7-bit ASCII. For example, in the 7-bit ASCII character set, ASCII (`a') returns 97. Remember that the collating code for uppercase letters is different from that for lowercase letters. ASCII (`A') returns 65 (in the 7-bit ASCII character set) because the uppercase letters come before the lowercase letters in the sequence.

If you pass more than one character in the parameter to ASCII, it returns the collating code for the first character and ignores the other characters. As a result, the following calls to ASCII all return the same value of 100:


ASCII ('defg') ==> 100 ASCII ('d') ==> 100 ASCII ('d_e_f_g') ==> 100

11.1.2 The CHR function

The CHR function is the inverse of ASCII. It returns a VARCHAR2 character (length 1) that corresponds to the location in the collating sequence provided as a parameter. The specification of the CHR function is:


FUNCTION CHR (code_location IN NUMBER) RETURN VARCHAR2

where code_location is the number specifying the location in the collating sequence.

The CHR function is especially valuable when you need to make reference to a nonprintable character in your code. For example, the location in the standard ASCII collating sequence for the newline character is ten. The CHR function therefore gives me a way to search for the linefeed control character in a string, and perform operations on a string based on the presence of that control character.

You can also insert a linefeed into a character string using the CHR function. Suppose I have to build a report that displays the address of a company. A company can have up to four address strings (in addition to city, state, and zipcode). I need to put each address string on a new line, but I don't want any blank lines embedded in the address. The following SELECT will not do the trick:


SELECT name, address1, address2, address3, address4,        city || ', ' || state || ' ' || zipcode location   FROM company;

Assuming each column (report field) goes on a new line, you will end up using six lines per address, no matter how many of these address strings are NULL. For example:


HAROLD HENDERSON 22 BUNKER COURT SUITE 100 WYANDANCH, MN 66557

You can use the CHR function to suppress these internal blank lines as follows:


SELECT name ||        DECODE (address1, NULL, NULL, CHR (10) || address1) ||        DECODE (address2, NULL, NULL, CHR (10) || address2) ||        DECODE (address3, NULL, NULL, CHR (10) || address3) ||        DECODE (address4, NULL, NULL, CHR (10) || address4) ||        CHR (10) ||        city || ', ' || state || ' ' || zipcode   FROM company;

Now the query returns a single formatted column per company. The DECODE statement offers IF-THEN logic within SQL and executes as follows: "If the address string is NULL then concatenate NULL; otherwise insert a linefeed character. Then concatenate the address string."

In this way, blank address lines are ignored. If I now use Wrap on the report field which holds this string, the address will be scrunched down to:


HAROLD HENDERSON 22 BUNKER COURT SUITE 100 WYANDANCH, MN 66557

11.1.3 The CONCAT function

The CONCAT function concatenates by taking two VARCHAR2 strings and returning those strings appended together in the order specified. The specification of the CONCAT function is:


FUNCTION CONCAT (string1 IN VARCHAR2, string2 IN VARCHAR2)                 RETURN VARCHAR2

CONCAT always appends string2 to the end of string1. If either string is NULL, CONCAT returns the non-NULL argument all by its lonesome. If both strings are NULL, CONCAT returns NULL. Here are some examples of uses of CONCAT:


CONCAT ('abc', 'defg') ==> 'abcdefg' CONCAT (NULL, 'def') ==> 'def' CONCAT ('ab', NULL) ==> 'ab' CONCAT (NULL, NULL) ==> NULL

I have a confession to make about CONCAT: I have never used it once in all my years of PL/SQL coding. In fact, I never even noticed it was available until I did the research for this book. How can this be? Did I never have to concatenate strings together in my programs? No, I certainly have performed many acts of concatenation in my time. Surprisingly, the answer is that PL/SQL (and the Oracle RDBMS) offers a second concatenation operator -- the double vertical bars (||). This operator is much more flexible and powerful and is easier to use than CONCAT.

11.1.4 The INITCAP function

The INITCAP function reformats the case of the string argument, setting the first letter of each word to uppercase and the remainder of the letters in that word to lowercase. A word is a set of characters separated by a space or nonalphanumeric characters (such as # or _ ). The specification of INITCAP is:


FUNCTION INITCAP (string_in IN VARCHAR2) RETURN VARCHAR2

Here are some examples of the impact of INITCAP on your strings:

When and why would you use INITCAP? Many Oracle shops like to store all character string data in the database, such as names and addresses, in uppercase. This makes it easier to search for records that match certain criteria.

The problem with storing all the data in uppercase is that, while it is a convenient "machine format," it is not particularly readable or presentable. How easy is it to scan a page of information that looks like the following?


CUSTOMER TRACKING LIST - GENERATED ON 12-MAR-1994 LAST PAYMENT WEDNESDAY: PAUL JOHNSON, 123 MADISON AVE - $1200 LAST PAYMENT MONDAY: HARRY SIMMERSON, 555 AXELROD RD - $1500

It is hard for the eye to pick out the individual words and different types of information; all that text just blends in together. Furthermore, solid uppercase simply has a "machine" or even "mainframe" feel to it; you'd never actually type it that way. A mixture of upper- and lowercase can make your output much more readable and friendly in appearance:


Customer Tracking List - Generated On 12-Mar-1994 Last Payment Wednesday: Paul Johnson, 123 Madison Ave - $1200 Last Payment Monday: Harry Simmerson, 555 Axelrod Rd - $1500

Can you see any problems with using INITCAP to format output? There are a couple of drawbacks to the way it works. First, as you saw earlier with the string "BIG AND TALL", INITCAP is not very useful for generating titles, since it doesn't know that little words like "and" and "the" should not be capitalized. That is a relatively minor problem compared with the second: INITCAP is completely ignorant of real-world surname conventions. Names with internal capital letters, in particular, cannot be generated with INITCAP. Consider the following example:


INITCAP ('HAMBURGERS BY THE BILLIONS AT MCDONALDS') ==>    'Hamburgers By The Billions At Mcdonalds'

Use INITCAP with caution when printing reports or displaying data, since the information it produces may not always be formatted correctly.

11.1.5 The INSTR function

The INSTR function searches a string to find a match for the substring and, if found, returns the position, in the source string, of the first character of that substring. If there is no match, then INSTR returns 0. In Oracle7, if nth_appearance is not positive (i.e., if it is 0 or negative), then INSTR always returns 1. In Oracle8, a value of 0 or a negative number for nth_appearance causes INSTR to raise the VALUE_ERROR exception.

The specification of the INSTR function is:


FUNCTION INSTR    (string1 IN VARCHAR2,     string2 IN VARCHAR2    [,start_position IN NUMBER := 1    [, nth_appearance IN NUMBER := 1]]) RETURN NUMBER

where string1 is the string searched by INSTR for the position in which the nth_appearance of string2 is found. The start_position parameter is the position in the string where the search will start. It is optional and defaults to 1 (the beginning of string1). The nth_appearance parameter is also optional and also defaults to 1.

Both the start_position and nth_appearance parameters can be literals like 5 or 157, variables, or complex expressions, as follows:


INSTR (company_name, 'INC', (last_location + 5) * 10)

If start_position is negative, then INSTR counts back start_position number of characters from the end of the string and then searches from that point towards the beginning of the string for the nth match. Figure 11.1 illustrates the two directions in which INSTR searches, depending on the sign of the start_position parameter.

Figure 11.1: Forward and reverse searches with INSTR

figure 11.1

I have found INSTR to be a very handy function -- especially when used to the fullest extent possible. Most programmers make use of (and are even only aware of) only the first two parameters. Use INSTR to search from the end of the string? Search for the nth appearance, as opposed to just the first appearance? "Wow!" many programmers would say, "I didn't know it could do that." Take the time to get familiar with INSTR and use all of its power.

Let's look at some examples of INSTR. In these examples, you will see all four parameters used in all their permutations. As you write your own programs, keep in mind the different ways in which INSTR can be used to extract information from a string; it can greatly simplify the code you write to parse and analyze character data.

11.1.6 The LENGTH function

The LENGTH function returns the length of the specified string. The specification for LENGTH follows:


FUNCTION LENGTH (string1 VARCHAR2) RETURN NUMBER

If string1 is NULL, then LENGTH returns NULL -- not zero (0)! Remember, a NULL string is a "nonvalue." Therefore, it cannot have a length, even a zero length.

The LENGTH function, in fact, will never return zero; it will always return either NULL or a positive number. Here are some examples:


LENGTH (NULL) ==> NULL LENGTH ('') ==> NULL -- Same as a NULL string. LENGTH ('abcd') ==> 4 LENGTH ('abcd ') ==> 5

If string1 is a fixed-length CHAR datatype, then LENGTH counts the trailing blanks in its calculation. So the LENGTH of a fixed-length string is always the declared length of the string. If you want to compute the length of the nonblank characters in string1, you will need to use the RTRIM function to remove the trailing blanks (RTRIM is discussed later in this chapter). In the following example, length1 is set to 60 and length2 is set to 14.


DECLARE    company_name CHAR(60) := 'ACME PLUMBING';    length1 NUMBER;    length2 NUMBER; BEGIN    length1 := LENGTH (company_name);    length2 := LENGTH (RTRIM (company_name)); END;

LENGTHB is the multiple-byte character set version of LENGTH. For single-byte character sets (such as American English), LENGTH returns the same values as INSTR.

11.1.7 The LOWER function

The LOWER function converts all letters in the specified string to lowercase. The specifications for the LOWER function are:


FUNCTION LOWER (string1 IN CHAR) RETURN CHAR FUNCTION LOWER (string1 IN VARCHAR2) RETURN VARCHAR2

As I noted earlier, LOWER and UPPER will actually return a fixed-length string if the incoming string is fixed-length. LOWER will not change any characters in the string that are not letters, since case is irrelevant for numbers and special characters, such as the dollar sign ($).

Here are some examples of the effect of LOWER:


LOWER ('BIG FAT LETTERS') ==> 'big fat letters' LOWER ('123ABC') ==> '123abc'

LOWER and its partner in case conversion, UPPER, are useful for guaranteeing a consistent case when comparing strings. PL/SQL is not a case-sensitive language with regard to its own syntax and names of identifiers. It is sensitive to case, however, in character strings, whether found in named constants, literals, or variables. The string "ABC" is not the same as "abc", and this can cause problems in your programs if you are not careful and consistent in your handling of such values.

11.1.8 The LPAD function

By default, PL/SQL strips all trailing blanks from a character string unless it is declared with a fixed-length CHAR datatype. There are occasions, however, when you really want some spaces (or even some other character) added to the front or back of your string. LPAD (and its right-leaning cousin, RPAD) give you this capability. The LPAD function returns a string padded to the left (hence the "L" in "LPAD") to a specified length, and with a specified pad string. The specification of the LPAD function is:


FUNCTION LPAD    (string1 IN VARCHAR2,     padded_length IN NUMBER     [, pad_string IN VARCHAR2]) RETURN VARCHAR2

LPAD returns string1 padded on the left to length padded_length with the optional character string pad_string. If you do not specify pad_string, then string1 is padded on the left with spaces. You must specify the padded_length. If string1 already has a length equal to padded_length, then LPAD returns string1 without any additional characters. If padded_length is smaller than the length of string1, LPAD effectively truncates string1 -- it returns only the first padded_length characters of the incoming string1.

As you can easily see, LPAD can do a lot more than just add spaces to the left of a string. Let's look at some examples of how useful LPAD can be.

11.1.9 The LTRIM function

The LTRIM function is the opposite of LPAD. Whereas LPAD adds characters to the left of a string, LTRIM removes, or trims, characters from the leading portion of the string. And just as with LPAD, LTRIM offers much more flexibility than simply removing leading blanks. The specification of the LTRIM function is:


FUNCTION LTRIM (string1 IN VARCHAR2 [, trim_string IN VARCHAR2]) RETURN VARCHAR2

LTRIM returns string1 with all leading characters removed up to the first character not found in the trim_string. The second parameter is optional and defaults to a single space.

There is one important difference between LTRIM and LPAD. LPAD pads to the left with the specified string, and repeats that string (or pattern of characters) until there is no more room. LTRIM, on the other hand removes all leading characters which appear in the trim string, not as a pattern, but as individual candidates for trimming.

Here are some examples:

11.1.10 The REPLACE function

The REPLACE function returns a string in which all occurrences of a specified match string are replaced with a replacement string. REPLACE is useful for searching out patterns of characters and then changing all instances of that pattern in a single function call. The specification of the REPLACE function is:


FUNCTION REPLACE (string1 IN VARCHAR2, match_string IN VARCHAR2                   [, replace_string IN VARCHAR2]) RETURN VARCHAR2

If you do not specify the replacement string, then REPLACE simply removes all occurrences of the match_string in string1. If you specify neither a match string nor a replacement string, REPLACE returns NULL.

Here are several examples using REPLACE:

11.1.11 The RPAD function

The RPAD function adds characters to the end of a character string. It returns a string padded to the right (hence the "R" in "RPAD") to a specified length, and with an optional pad string. The specification of the RPAD function is:


FUNCTION RPAD    (string1 IN VARCHAR2,     padded_length IN NUMBER     [, pad_string IN VARCHAR2]) RETURN VARCHAR2

RPAD returns string1 padded on the right to length padded_length with the optional character string pad_string. If you do not specify pad_string, then string1 is padded on the right with spaces. You must specify the padded_length. If string1 already has a length equal to padded_length, then RPAD returns string1 without any additional characters. If padded_length is smaller than the length of string1, RPAD effectively truncates string1: it returns only the first padded_length characters of the incoming string1.

Let's look at some examples of RPAD:

You can use RPAD (and LPAD as well) to generate repetitive sequences of characters. I have used this technique in SQL*Reportwriter V1.1, where graphical objects like boxes are not really available. I can include the RPAD in a SELECT statement in the report, and then use the corresponding field in Text elements to provide lines to break up the data in a report.

11.1.12 The RTRIM function

The RTRIM function is the opposite of RPAD, and the companion function to LTRIM. Where RPAD adds characters to the right of a string, RTRIM removes, or trims, characters from the end portion of the string. Just as with RPAD, RTRIM offers much more flexibility than simply removing trailing blanks. The specification of the RTRIM function is:


FUNCTION RTRIM (string1 IN VARCHAR2 [, trim_string IN VARCHAR2]) RETURN VARCHAR2

RTRIM returns string1 with all trailing characters removed up to the first character not found in the trim_string. The second parameter is optional and defaults to a single space.

Here are some examples of RTRIM:

11.1.13 The SOUNDEX function

The SOUNDEX function allows you to perform string comparisons based on phonetics (the way a word sounds), as opposed to semantics (the way a word is spelled).[1] SOUNDEX returns a character string which is the "phonetic representation" of the argument. The specification of the SOUNDEX function is as follows:

[1] Oracle Corporation used the algorithm in The Art of Computer Programming, Volume 3, by Donald Knuth, to generate the phonetic representation.


FUNCTION SOUNDEX (string1 IN VARCHAR2) RETURN VARCHAR2

Here are some of the values SOUNDEX generated and how they vary according to the input string:


SOUNDEX ('smith') ==> 'S530' SOUNDEX ('SMYTHE') ==> ''S530' SOUNDEX ('smith smith') ==> 'S532' SOUNDEX ('smith z') ==> 'S532' SOUNDEX ('feuerstein') ==> 'F623' SOUNDEX ('feuerst') ==> 'F623'

Keep the following SOUNDEX rules in mind when using this function:

The SOUNDEX function is useful for ad hoc queries, and any other kinds of searches where the exact spelling of a database value is not known or easily determined.

11.1.14 The SUBSTR function

The SUBSTR function is one of the most useful and commonly used character functions. It allows you to extract a portion or subset of contiguous (connected) characters from a string. The substring is specified by starting position and a length.

The specification for the SUBSTR function is:


FUNCTION SUBSTR    (string_in IN VARCHAR2,     start_position_in IN NUMBER     [, substr_length_in IN NUMBER]) RETURN VARCHAR2

where the arguments are used as follows:

string_in

The source string

start_position_in

The starting position of the substring in string_in

substr_length_in

The length of the substring desired (the number of characters to be returned in the substring)

The last parameter, substr_length_in, is optional. If you do not specify a substring length, then SUBSTR returns all the characters to the end of string_in (from the starting position specified).

The start position cannot be zero. If the start position is less than zero, then the substring is retrieved from the back of the string. SUBSTR counts backwards substr_length_in number of characters from the end of string_in. In this case, however, the characters which are extracted are still to the right of the starting position. See Figure 11.2 for an illustration of how the different arguments are used by SUBSTR.

Figure 11.2: How arguments are used by SUBSTR

figure 11.2

The substr_length_in argument must be greater than zero.

You will find that in practice SUBSTR is very forgiving. Even if you violate the rules for the values of the starting position and the number of characters to be substringed, SUBSTR will not generate errors. Instead, for the most part, it will return NULL -- or the entire string -- as its answer.

Here are some examples of SUBSTR:

By the way, SUBSTRB is the multiple-byte character set version of SUBSTR. For single-byte character sets (such as American English), SUBSTRB returns the same values as SUBSTR.

11.1.15 The TRANSLATE function

The TRANSLATE function is a variation on REPLACE. REPLACE replaces every instance of a set of characters with another set of characters; that is, REPLACE works with entire words or patterns. TRANSLATE replaces single characters at a time, translating the nth character in the match set with the nth character in the replacement set. The specification of the TRANSLATE is as follows:


FUNCTION TRANSLATE    (string_in IN VARCHAR2,     search_set IN VARCHAR2,     replace_set VARCHAR2) RETURN VARCHAR2

where string_in is the string in which characters are to be translated, search_set is the set of characters to be translated (if found), and replace_set is the set of characters which will be placed in the string. Unlike REPLACE, where the last argument could be left off, you must include all three arguments when you use TRANSLATE. Any of the three arguments may, however, be NULL, in which case TRANSLATE always returns NULL.

Here are some examples of the effect of TRANSLATE:


TRANSLATE ('abcd', 'ab', '12') ==> '12cd' TRANSLATE ('12345', '15', 'xx') ==> 'x234x' TRANSLATE ('grumpy old possum', 'uot', '%$*') ==>   'gr%mpy $ld p$ss%m' TRANSLATE ('my language needs the letter e', 'egms', 'X') ==>    'y lanuaX nXXd thX lXttXr X'; TRANSLATE ('please go away', 'a', NULL) ==> NULL

You can deduce a number of the usage rules for TRANSLATE from the above examples, but I spell them out here:

The TRANSLATE function comes in handy when you need to change a whole set of characters in a string, regardless of the order in which they appear in the string. Section 11.2.5, "Verifying String Formats with TRANSLATE" demonstrates how handy this feature can be.

11.1.16 The UPPER function

The UPPER function converts all letters in the specified string to uppercase. The specifications of the UPPER function are:


FUNCTION UPPER (string1 IN CHAR) RETURN CHAR FUNCTION UPPER (string1 IN VARCHAR2) RETURN VARCHAR2

As I noted at the beginning of this chapter, UPPER and LOWER will actually return a fixed-length string if the incoming string is fixed length. UPPER will not change any characters in the string which are not letters, since case is irrelevant for numbers and special characters such as $.

Here are some examples of the effect of UPPER:


UPPER ('short little letters no more') ==> 'SHORT LITTLE LETTERS NO MORE' UPPER ('123abc') ==> '123ABC'

UPPER and its partner in case conversion, LOWER, are useful for guaranteeing a consistent case when comparing strings. PL/SQL is not a case-sensitive language as concerns its own syntax and names of identifiers. It is sensitive to case, however, in character strings, whether found in named constants, literals, or variables. The string "ABC" is not the same as "abc", and this can cause problems in your programs if you are not careful and consistent in your handling of such values.


III. Built-In Functions 11.2 Character Function Examples

Copyright (c) 2000 O'Reilly & Associates. All rights reserved.



Oracle PL/SQL Programming
Oracle PL/SQL Programming: Covers Versions Through Oracle Database 11g Release 2 (Animal Guide)
ISBN: 0596514468
EAN: 2147483647
Year: 2004
Pages: 234
Authors: Steven Feuerstein, Bill Pribyl
BUY ON AMAZON

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net