4.13 The HLA String Module and Other String-Related Routines


4.13 The HLA String Module and Other String-Related Routines

Although HLA provides a powerful definition for string data, the real power behind HLA's string capabilities lies in the HLA Standard Library, not in the definition of HLA string data. HLA provides several dozen string manipulation routines that far exceed the capabilities found in standard high level languages like C/C++, Java, or Pascal; indeed, HLA's string handling capabilities rival those in string processing languages like Icon or SNOBOL4. While it is premature to introduce all of HLA's character string handling routines, this chapter will discuss many of the string facilities that HLA provides.

Perhaps the most basic string operation you will need is to assign one string to another. There are three different ways to assign strings in HLA: by reference, by copying a string, and by duplicating a string. Of these, assignment by reference is the fastest and easiest. If you have two strings and you wish to assign one string to the other, a simple and fast way to do this is to copy the string pointer. The following code fragment demonstrates this:

 static       string1:       string          := "Some String Data";       string2:       string;             .             .             .       mov( string1, eax );       mov( eax, string2 );             .             .             . 

String assignment by reference is very efficient because it only involves two simple mov instructions regardless of the string length. Assignment by reference works great if you never modify the string data after the assignment operation. Do keep in mind, though, that both string variables (string1 and string2 in the example above) wind up pointing at the same data. So if you make a change to the data pointed at by one string variable, you will change the string data pointed at by the second string object because both objects point at the same data. Listing 4-13 provides a program that demonstrates this problem.

Listing 4-13: Problem with String Assignment by Copying Pointers.

start example
 // Program to demonstrate the problem // with string assignment by reference. program strRefAssignDemo; #include( "stdlib.hhf" ); static     string1:     string;     string2:     string; begin strRefAssignDemo;     // Get a value into string1     forever         stdout.put( "Enter a string with at least three characters: " );         stdin.a_gets();         mov( eax, string1 );         breakif( (type str.strRec [eax]).length >= 3 );         stdout.put( "Please enter a string with at least three chars." nl );     endfor;     stdout.put( "You entered: '", string1, "'" nl );     // Do the string assignment by copying the pointer     mov( string1, ebx );     mov( ebx, string2 );     stdout.put( "String1= '", string1, "'" nl );     stdout.put( "String2= '", string2, "'" nl );     // Okay, modify the data in string1 by overwriting     // the first three characters of the string (note that     // a string pointer always points at the first character     // position in the string and we know we've got at least     // three characters here).     mov( 'a', (type char [ebx]) );     mov( 'b', (type char [ebx+1]) );     mov( 'c', (type char [ebx+2]) );     // Okay, demonstrate the problem with assignment via     // pointer copy.     stdout.put     (         "After assigning 'abc' to the first three characters in string1:"         nl         nl     );     stdout.put( "String1= '", string1, "'"nl );     stdout.put( "String2= '", string2, "'"nl );     strfree( string1 ); // Don't free string2 as well! end strRefAssignDemo; 
end example

Because both string1 and string2 point at the same string data in this example, any change you make to one string is reflected in the other. While this is sometimes acceptable, most programmers expect assignment to produce a different copy of a string; that is, they expect the semantics of string assignment to produce two unique copies of the string data.

An important point to remember when using copy by reference (this term means copying a pointer) is that you have created an alias to the string data. The term "alias" means that you have two names for the same object in memory (e.g., in the program above, string1 and string2 are two different names for the same string data). When you read a program it is reasonable to expect that different variables refer to different memory objects. Aliases violate this rule, thus making your program harder to read and understand because you've got to remember that aliases do not refer to different objects in memory. Failing to keep this in mind can lead to subtle bugs in your program. For instance, in the example above you have to remember that string1 and string2 are aliases so as not to free both objects at the end of the program. Worse still, you have to remember that string1 and string2 are aliases so that you don't continue to use string2 after freeing string1 in this code because string2 would be a dangling reference at that point.

Because using copy by reference makes your programs harder to read and increases the possibility that you might introduce subtle defects in your programs, you might wonder why someone would use copy by reference at all. There are two reasons for this: First, copy by reference is very efficient; it only involves the execution of two mov instructions. Second, some algorithms actually depend on copy by reference semantics. Nevertheless, you should carefully consider whether copying string pointers is the appropriate way to do a string assignment in your program before using this technique.

The second way to assign one string to another is to copy the string data. The HLA Standard Library str.cpy routine provides this capability. A call to the str.cpy procedure using the following call syntax:[16]

 str.cpy( source_string, destination_string ); 

The source and destination strings must be string variables (pointers) or 32-bit registers containing the addresses of the string data in memory.

The str.cpy routine first checks the maximum length field of the destination string to ensure that it is at least as big as the current length of the source string. If it is not, then str.cpy raises the ex.StringOverflow exception. If the destination string's maximum length is large enough, then str.cpy copies the string length, the characters, and the zero terminating byte from the source string to the destination string's data area. When this process is complete, the two strings point at identical data, but they do not point at the same data in memory.[17] The program in Listing 4-14 is a rework of the example in Listing 4-13 using str.cpyrather than copy by reference.

Listing 4-14: Copying Strings Using str.cpy.

start example
 // Program to demonstrate string assignment using str.cpy. program strcpyDemo; #include( "stdlib.hhf" ); static     string1:     string;     string2:     string; begin strcpyDemo;     // Allocate storage for string2:     stralloc( 64 );     mov( eax, string2 );     // Get a value into string1     forever         stdout.put( "Enter a string with at least three characters: ");         stdin.a_gets();         mov( eax, string1 );         breakif( (type str.strRec [eax]).length >= 3 );         stdout.put( "Please enter a string with at least three chars." nl );     endfor;     // Do the string assignment via str.cpy     str.cpy( string1, string2 );     stdout.put( "String1= '", string1, "'" nl );     stdout.put( "String2= '", string2, "'" nl );     // Okay, modify the data in string1 by overwriting     // the first three characters of the string (note that     // a string pointer always points at the first character     // position in the string and we know we've got at least     // three characters here).     mov( string1, ebx );     mov( 'a', (type char [ebx]) );     mov( 'b', (type char [ebx+1]) );     mov( 'c', (type char [ebx+2]) );     // Okay, demonstrate that we have two different strings     // because we used str.cpy to copy the data:     stdout.put     (         "After assigning 'abc' to the first three characters in string1:"         nl         nl     );     stdout.put( "String1= '", string1, "'" nl );     stdout.put( "String2= '", string2, "'" nl );     // Note that we have to free the data associated with both     // strings because they are not aliases of one another.     strfree( string1 );     strfree( string2 ); end strcpyDemo; 
end example

There are two really important things to note about the program in Listing 4-14. First, note that this program begins by allocating storage for string2. Remember, the str.cpy routine does not allocate storage for the destination string; it assumes that the destination string already has storage allocated. Keep in mind that str.cpy does not initialize string2; it only copies data to the location where string2is pointing. It is the program's responsibility to initialize the string by allocating sufficient memory before calling str.cpy. The second thing to notice here is that the program calls strfree to free up the storage for both string1 and string2before the program quits.

Allocating storage for a string variable prior to calling str.cpy is so common that the HLA Standard Library provides a routine that allocates and copies the string: str.a_cpy. This routine uses the following call syntax:

 str.a_cpy( source_string ); 

Note that there is no destination string. This routine looks at the length of the source string, allocates sufficient storage, makes a copy of the string, and then returns a pointer to the new string in the EAX register. The program in Listing 4-15 demonstrates how to do the same thing as the program in Listing 4-14 using the str.a_cpy procedure.

Listing 4-15: Copying Strings Using str.a_cpy.

start example
 // Program to demonstrate string assignment using str.a_cpy. program stra_cpyDemo; #include( "stdlib.hhf" ); static     string1:     string;     string2:     string; begin stra_cpyDemo;     // Get a value into string1     forever         stdout.put( "Enter a string with at least three characters: ");         stdin.a_gets();         mov( eax, string1 );         breakif( (type str.strRec [eax]).length >= 3 );         stdout.put( "Please enter a string with at least three chars." nl );     endfor;     // Do the string assignment via str.a_cpy     str.a_cpy( string1 );     mov( eax, string2 );     stdout.put( "String1= '", string1, "'" nl );     stdout.put( "String2= '", string2, "'" nl );     // Okay, modify the data in string1 by overwriting     // the first three characters of the string (note that     // a string pointer always points at the first character     // position in the string and we know we've got at least     // three characters here).     mov( string1, ebx );     mov( 'a', (type char [ebx]) );     mov( 'b', (type char [ebx+1]) );     mov( 'c', (type char [ebx+2]) );     // Okay, demonstrate that we have two different strings     // because we used str.cpy to copy the data:     stdout.put     (         "After assigning 'abc' to the first three characters in string1:"         nl         nl     );     stdout.put( "String1= '", string1, "'" nl );     stdout.put( "String2= '", string2, "'" nl );     // Note that we have to free the data associated with both     // strings because they are not aliases of one another.     strfree( string1 );     strfree( string2 ); end stra_cpyDemo; 
end example

Caution

Whenever using copy by reference or str.a_cpy to assign a string, don't forget to free the storage associated with the string when you are (completely) done with that string's data. Failure to do so may produce a memory leak if you do not have another pointer to the previous string data laying around.

Obtaining the length of a character string is so common that the HLA Standard Library provides a str.length routine specifically for this purpose. Of course, you can fetch the length by using the str.strRec data type to access the length field directly, but constant use of this mechanism can be tiring because it involves a lot of typing. The str.length routine provides a more compact and convenient way to fetch the length information. You call str.length using one of the following two formats:

 str.length( Reg32 ); str.length( string_variable ); 

This routine returns the current string length in the EAX register.

Another pair of useful string routines are the str.cat and str.a_catprocedures. They use the following syntax:

 str.cat( srcRStr, destLStr ); str.a_cat( srcLStr, srcRStr ); 

These two routines concatenate two strings (that is, they create a new string by joining the two strings together). The str.cat procedure concatenates the source string to the end of the destination string. Before the concatenation actually takes place, str.cat checks to make sure that the destination string is large enough to hold the concatenated result; it raises the ex.StringOverflow exception if the destination string's maximum length is too small.

The str.a_cat routine, as its name suggests, allocates storage for the resulting string before doing the concatenation. This routine will allocate sufficient storage to hold the concatenated result; then it will copy the src1Str to the allocated storage. Finally it will append the string data pointed at by src2Str to the end of this new string and return a pointer to the new string in the EAX register.

Caution

Note a potential source of confusion. The str.catprocedure concatenates its first operand to the end of the second operand. Therefore, str.cat follows the standard (src, dest) operand format present in many HLA statements. The str.a_cat routine, on the other hand, has two source operands rather than a source and destination operand. The str.a_cat routine concatenates its two operands in an intuitive left-to-right fashion. This is the opposite of str.cat. Keep this in mind when using these two routines.

Listing 4-16 demonstrates the use of the str.cat and str.a_cat routines:

Listing 4-16: Demonstration of str.cat and str.a_cat Routines.

start example
 // Program to demonstrate str.cat and str.a_cat. program strcatDemo; #include( "stdlib.hhf" ); static     UserName:     string;     Hello:        string;     a_Hello:      string; begin strcatDemo;     // Allocate storage for the concatenated result:     stralloc( 1024 );     mov( eax, Hello );     // Get some user input to use in this example:     stdout.put( "Enter your name: ");     stdin.flushInput();     stdin.a_gets();     mov( eax, UserName );     // Use str.cat to combine the two strings:     str.cpy( "Hello ", Hello );     str.cat( UserName, Hello );     // Use str.a_cat to combine the string strings:     str.a_cat( "Hello ", UserName );     mov( eax, a_Hello );     stdout.put( "Concatenated string #1 is '", Hello, "'" nl );     stdout.put( "Concatenated string #2 is '", a_Hello, "'" nl );     strfree( UserName );     strfree( a_Hello );     strfree( Hello ); end strcatDemo; 
end example

The str.insert and str.a_insert routines are similar to the string concatenation procedures. However, the str.insert and str.a_insert routines let you insert one string anywhere into another string, not just at the end of the string. The calling sequences for these two routines are

 str.insert( src, dest, index ); str.a_insert( StrToInsert, StrToInsertInto, index ); 

These two routines insert the source string (src or StrToInsert) into the destination string (dest or StrToInsertInto) starting at character position index. The str.insert routine inserts the source string directly into the destination string; if the destination string is not large enough to hold both strings, str.insert raises an ex.StringOverflow exception. The str.a_insert routine first allocates a new string on the heap, copies the destination string (StrToInsertInto) to the new string, and then inserts the source string (StrToInsert) into this new string at the specified offset; str.a_insert returns a pointer to the new string in the EAX register.

Indexes into a string are zero based. This means that if you supply the value zero as the index in str.insert or str.a_insert, then these routines will insert the source string before the first character of the destination string. Likewise, if the index is equal to the length of the string, then these routines will simply concatenate the source string to the end of the destination string.

Note

If the index is greater than the length of the string, the str.insert and str.a_insert procedures will not raise an exception; instead, they will simply append the source string to the end of the destination string.

The str.delete and str.a_delete routines let you remove characters from a string. They use the following calling sequence:

 str.delete( strng, StartIndex, Length ); str.a_delete( strng, StartIndex, Length ); 

Both routines delete Length characters starting at character position StartIndex in string strng. The difference between the two is that str.delete deletes the characters directly from strng, whereas str.a_delete first allocates storage and copies strng, then deletes the characters from the new string (leaving strng untouched). The str.a_delete routine returns a pointer to the new string in the EAX register.

The str.delete and str.a_delete routines are very forgiving with respect to the values you pass in StartIndex and Length. If StartIndex is greater than the current length of the string, these routines do not delete any characters from the string. If StartIndex is less than the current length of the string, but StartIndex+Length is greater than the length of the string, then these routines will delete all characters from StartIndex to the end of the string.

Another very common string operation is the need to copy a portion of a string to another string without otherwise affecting the source string. The str.substr and str.a_substr routines provide this capability. These routines use the following syntax:

 str.substr( src, dest, StartIndex, Length ); str.a_substr( src, StartIndex, Length ); 

The str.substr routine copies length characters, starting at position StartIndex, from the src string to the dest string. The dest string must have sufficient storage to hold the new string or str.substr will raise an ex.StringOverflow exception. If the StartIndex value is greater than the length of the string, then str.substr will raise an ex.StringIndexError exception. If StartIndex+Length is greater than the length of the source string, but StartIndex is less than the length of the string, then str.substr will extract only those characters from StartIndex to the end of the string.

The str.a_substr procedure behaves in a fashion nearly identical to str.substr except it allocates storage on the heap for the destination string. Other than overflow never occurs, str.a_substr handles exceptions the identically to str.substr.[18] As you can probably guess by now, str.a_substr returns a pointer to the newly allocated string in the EAX register.

After you begin working with string data for a little while, the need will invariably arise to compare two strings. A first attempt at string comparison, using the standard HLA relational operators, will compile but not necessarily produce the desired result:

 mov( s1, eax ); if( eax = s2 ) then     << code to execute if the strings are equal >> else     << code to execute if the strings are not equal >> endif; 

As just stated, this code will compile and execute just fine. However, it's probably not doing what you expect it to do. Remember, strings are pointers. This code compares the two pointers to see if they are equal. If they are equal, clearly the two strings are equal (because both s1 and s2 point at the exact same string data). However, the fact that the two pointers are different doesn't necessarily mean that the strings are not equivalent. Both s1 and s2 could contain different values (that is, they point at different addresses in memory), yet the string data at those two different addresses could be identical. Most programmers expect a string comparison for equality to be true if the data for the two strings is the same. Clearly a pointer comparison does not provide this type of comparison. To overcome this problem, the HLA Standard Library provides a set of string comparison routines that will compare the string data, not just their pointers. These routines use the following calling sequences:

 str.eq( src1, src2 ); str.ne( src1, src2 ); str.lt( src1, src2 ); str.le( src1, src2 ); str.gt( src1, src2 ); str.ge( src1, src2 ); 

Each of these routines compares the src1 string to the src2 string and return true (1) or false (0) in the EAX register depending on the comparison. For example, "str.eq( s1, s2);" returns true in EAX if s1 is equal to s2. HLA provides a small extension that allows you to use the string comparison routines within an if statement.[19] The following code demonstrates the use of some of these comparison routines within an if statement:

      stdout.put( "Enter a single word: ");      stdin.a_gets();      if( str.eq( eax, "Hello" )) then          stdout.put( "You entered 'Hello'", nl );      endif;      strfree( eax ); 

Note that the string the user enters in this example must exactly match "Hello", including the use of an upper case "H" at the beginning of the string. When pro-cessing user input, it is best to ignore alphabetic case in string comparisons because different users have different ideas about when they should be pressing the shift key on the keyboard. An easy solution is to use the HLA case-insensitive string comparison functions. These routines compare two strings ignoring any differences in alphabetic case. These routines use the following calling sequences:

 str.ieq( src1, src2 ); str.ine( src1, src2 ); str.ilt( src1, src2 ); str.ile( src1, src2 ); str.igt( src1, src2 ); str.ige( src1, src2 ); 

Other than they treat upper case characters the same as their lower case equivalents, these routines behave exactly like the former routines, returning true or false in EAX depending on the result of the comparison.

Like most high level languages, HLA compares strings using lexicographical ordering. This means that two strings are equal if and only if their lengths are the same and the corresponding characters in the two strings are exactly the same. For less than or greater than comparisons, lexicographical ordering corresponds to the way words appear in a dictionary. That is, "a" is less than "b" is less than "c", and so on. Actually, HLA compares the strings using the ASCII numeric codes for the characters, so if you are unsure whether "a" is less than a period, simply con-sult the ASCII character chart (incidentally, "a" is greater than a period in the ASCII character set, just in case you were wondering).

If two strings have different lengths, lexicographical ordering only worries about the length if the two strings exactly match up through the length of the shorter string. If this is the case, then the longer string is greater than the shorter string (and, conversely, the shorter string is less than the longer string). Note, however, that if the characters in the two strings do not match at all, then HLA's string comparison routines ignore the length of the string; e.g., "z" is always greater than "aaaaa" even though it is shorter.

The str.eq routine checks to see if two strings are equal. Sometimes, however, you might want to know whether one string contains another string. For example, you may want to know if some string contains the substring "north" or "south" to determine some action to take in a game. The HLA str.index routine lets you check to see if one string is contained as a substring of another. The str.index routine uses the following calling sequence:

 str.index( StrToSearch, SubstrToSearchFor ); 

This function returns, in EAX, the offset into StrToSearch where SubstrToSearchFor appears. This routine returns -1 in EAX if SubstrToSearchFor is not present in StrToSearch. Note that str.index will do a case-sensitive search. Therefore the strings must exactly match. There is no case-insensitive variant of str.index you can use.[20]

The HLA string module contains many additional routines besides those this section presents. Space limitations and prerequisite knowledge prevent the presentation of all the string functions here; however, this does not mean that the remaining string functions are unimportant. You should definitely take a look at the HLA Standard Library documentation to learn everything you can about the powerful HLA string library routines.

[16]Warning to C/C++ users: Note that the order of the operands is opposite the C Standard Library strcpy function.

[17]Unless, of course, both string pointers contained the same address to begin with, in which case str.cpy copies the string data over the top of itself.

[18]Technically, str.a_substr, like all routines that call malloc to allocate storage, can raise anex.MemoryAllocationFailure exception, but this is very unlikely to occur.

[19]This extension is actually a little more general than this section describes. A later chapter will explain it fully.

[20]However, HLA does provide routines that will convert all the characters in a string to one case or another. So you can make copies of the strings, convert all the characters in both copies to lower case, and then search using these converted strings. This will achieve the same result.




The Art of Assembly Language
The Art of Assembly Language
ISBN: 1593272073
EAN: 2147483647
Year: 2005
Pages: 246
Authors: Randall Hyde

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net