4.19 Character Set Support in the HLA Standard Library

The HLA Standard Library provides several character set routines you may find useful. The character set support routines fall into four categories: standard character set functions, character set tests, character set conversions, and character set I/O. This section describes these routines in the HLA Standard Library.

To begin with, let's consider the Standard Library routines that help you construct character sets. These routines include: cs.empty, cs.cpy, cs.charToCset, cs.unionChar, cs.removeChar, cs.rangeChar, cs.strToCset, and cs.unionStr. These procedures let you build up character sets at run-time using character and string objects.

The cs.empty procedure initializes a character set variable to the empty set by setting all the bits in the character set to zero. This procedure call uses the following syntax (CSvar is a character set variable):

 cs.empty( CSvar );

The cs.cpy procedure copies one character set to another, replacing any data previously held by the destination character set. The syntax for cs.cpy is

 cs.cpy( srcCsetValue, destCsetVar );

The cs.cpy source character set can be either a character set constant or a character set variable. The destination character set must be a character set variable.

The cs.unionChar procedure adds a character to a character set. It uses the following calling sequence:

 cs.unionChar( CharVar, CSvar );

This call will add the first parameter, a character, to the set via set union. Note that you could use the bts instruction to achieve this same result, although the cs.unionChar call is often more convenient (though slower).

The cs.charToCset function creates a singleton set (a set containing a single character). The calling format for this function is

 cs.charToCset( CharValue, CSvar );

The first operand, the character value CharValue, can be an 8-bit register, a constant, or a character variable. The second operand (CSvar) must be a character set variable. This function clears the destination character set to all zeros and then unions the specified character into the character set.

The cs.removeChar procedure lets you remove a single character from a character set without affecting the other characters in the set. This function uses the same syntax as cs.charToCset and the parameters have the same attributes. The calling sequence is

 cs.removeChar( CharValue, CSVar );

Note that if the character was not in the CSVar set to begin with, cs.removeChar will not affect the set.

The cs.rangeChar constructs a character set containing all the characters between two characters you pass as parameters. This function sets all bits outside the range of these two characters to zero. The calling sequence is

 cs.rangeChar( LowerBoundChar, UpperBoundChar, CSVar );

The LowerBoundChar and UpperBoundChar parameters can be constants, registers, or character variables. CSVar, the destination character set, must be a cset variable.

The cs.strToCset procedure creates a new character set containing the union of all the characters in a character string. This procedure begins by setting the destination character set to the empty set and then it unions in the characters in the string one by one until it exhausts all characters in the string. The calling sequence is

 cs.strToCset( StringValue, CSVar );

Technically, the StringValue parameter can be a string constant as well as a string variable; however, it doesn't make any sense to call cs.strToCset like this because cs.cpy is a much more efficient way to initialize a character set with a constant set of characters. As usual, the destination character set must be a cset variable. Typically, you'd use this function to create a character set based on a string input by the user.

The cs.unionStr procedure will add the characters in a string to an existing character set. Like cs.strToCset, you'd normally use this function to union characters into a set based on a string input by the user. The calling sequence for this is

 cs.unionStr( StringValue, CSVar );

Standard set operations include union, intersection, and set difference. The HLA Standard Library routines cs.setunion, cs.intersection, and cs.difference provide these operations, respectively.^[25] These routines all use the same calling sequence:

 cs.setunion( srcCset, destCset ); cs.intersection( srcCset, destCset ); cs.difference( srcCset, destCset );

The first parameter can be a character set constant or a character set variable. The second parameter must be a character set variable. These procedures compute "destCset := destCset op srcCset" where op represents set union, intersection, or difference, depending on the function call.

The third category of character set routines test character sets in various ways. They typically return a boolean value indicating the result of the test. The HLA character set routines in this category include cs.IsEmpty, cs.member, cs.subset, cs.psubset, cs.superset, cs.psuperset, cs.eq, and cs.ne.

The cs.IsEmpty function tests a character set to see if it is the empty set. The function returns true or false in the EAX register. This function uses the following calling sequence:

 cs.IsEmpty( CSetValue );

The single parameter may be a constant or a character set variable, although it doesn't make much sense to pass a character set constant to this procedure (because you would know at compile time whether this set is empty).

The cs.member function tests to see if a character value is a member of a set. This function returns true in the EAX register if the supplied character is a member of the specified set. Note that you can use the bt instruction to (more efficiently) test this same condition. However, the cs.member function is probably a little more convenient to use. The calling sequence for cs.member is

 cs.member( CharValue, CsetValue );

The first parameter is a register, a character variable, or a constant. The second parameter is either a character set constant or a character set variable. It would be unusual for both parameters to be constants.

The cs.subset, cs.psubset (proper subset), cs.superset, and cs.psuperset(proper superset) functions let you check to see if one character set is a subset or superset of another. The calling sequence for these four routines is nearly identical; it is one of the following:

 cs.subset( CsetValue1, CsetValue2 ); cs.psubset( CsetValue1, CsetValue2 ); cs.superset( CsetValue1, CsetValue2 ); cs.psuperset( CsetValue1, CsetValue2 );

These routines compare the first parameter against the second parameter and return true or false in the EAX register depending upon the result of the comparison. One set is a subset of another if all the members of the first character set are present in the second character set. It is a proper subset if the second character set also contains characters not found in the first (left) character set. Likewise, one character set is a superset of another if it contains all the characters in the second (right) set (and, possibly, more). A proper superset contains additional characters above and beyond those found in the second set. The parameters can be either character set variables or character set constants; however, it would be unusual for both parameters to be character set constants (because you can determine this at compile time, there would be no need to call a run-time function to compute this).

The cs.eq and cs.ne check to see if two sets are equal or not equal. These functions return true or false in EAX depending upon the set comparison. The calling sequence is identical to the subset/superset functions above:

 cs.eq( CsetValue1, CsetValue2 ); cs.ne( CsetValue1, CsetValue2 );

The cs.extract routine removes an arbitrary character from a character set and returns that character in the EAX register.^[26] The calling sequence is the following:

 cs.extract( CsetVar );

The single parameter must be a character set variable. Note that this function will modify the character set variable by removing some character from the character set. This function returns $FFFF_FFFF (-1) in EAX if the character set was empty prior to the call.

In addition to the routines found in the cset.hhf (character set) library module, the string and standard output modules also provide functions that allow or expect character set parameters. For example, if you supply a character set value as a parameter to stdout.put, the stdout.put routine will print the characters currently in the set. See the HLA Standard Library documentation for more details on character set–handling procedures.

^[25]cs.setunion was used rather than cs.union because "union" is an HLA reserved word.

^[26]This routine returns the character in AL and zeros out the H.O. three bytes of EAX.