4.12 Accessing the Characters Within a String


4.12 Accessing the Characters Within a String

Extracting individual characters from a string is a very common and easy task. In fact, it is so easy that HLA doesn't provide any specific procedure or language syntax to accomplish this: You simply use machine instructions to accomplish this. Once you have a pointer to the string data, a simple indexed addressing mode will do the rest of the work for you.

Of course, the most important thing to keep in mind is that strings are pointers. Therefore, you cannot apply an indexed addressing mode directly to a string variable and expect to extract characters from the string. That is, if s is a string variable, then "mov( s[ebx], al );" does not fetch the character at position EBX in string s and place it in the AL register. Remember, s is just a pointer variable; an addressing mode like s[ebx] will simply fetch the byte at offset EBX in memory starting at the address of s (see Figure 4-1).

click to expand
Figure 4-1: Incorrectly Indexing off a String Variable.

In Figure 4-1, assuming EBX contains three, s[ebx] does not access the fourth character in the string s; instead it fetches the fourth byte of the pointer to the string data. It is very unlikely that this is what you would want. Figure 4-2 shows the operation that is necessary to fetch a character from the string, assuming EBX contains the value of s.

click to expand
Figure 4-2: Correctly Indexing off the Value of a String Variable.

In Figure 4-2 EBX contains the value of string s. The value of s is a pointer to the actual string data in memory. Therefore, EBX will point at the first character of the string when you load the value of s into EBX. The following code demonstrates how to access the fourth character of string s in this fashion:

      mov( s, ebx );         // Get pointer to string data into EBX.      mov( [ebx+3], al );    // Fetch the fourth character of the string. 

If you want to load the character at a variable, rather than fixed, offset into the string, then you can use one of the 80x86's scaled indexed addressing modes to fetch the character. For example, if an uns32 variable index contains the desired offset into the string, you could use the following code to access the character at s[index]:

      mov( s, ebx );           // Get address of string data into EBX.      mov( index, ecx );       // Get desired offset into string.      mov( [ebx+ecx], al );    // Get the desired character into AL. 

There is only one problem with the code above: Iit does not check to ensure that the character at offset index actually exists. If index is greater than the current length of the string, then this code will fetch a garbage byte from memory. Unless you can apriori determine that index is always less than the length of the string, code like this is dangerous to use. A better solution is to check the index against the string's current length before attempting to access the character. The following code provides one way to do this.

      mov( s, ebx );      mov( index, ecx );      if( ecx < (type str.strRec [ebx]).Length ) then         mov( [ebx+ecx], al );      else         << error, string index is of bounds >>      endif; 

In the else portion of this if statement you could take corrective action, print an error message, or raise an exception. If you want to explicitly raise an exception, you can use the HLA raise statement to accomplish this. The syntax for the raise statement is

 raise( integer_constant );raise( reg32 ); 

The value of the integer_constant or 32-bit register must be an exception number. Usually, this is one of the predefined constants in the excepts.hhf header file. An appropriate exception to raise when a string index is greater than the length of the string is ex.StringIndexError. The following code demonstrates raising this exception if the string index is out of bounds:

      mov( s, ebx );      mov( index, ecx );      if( ecx < (type str.strRec [ebx]).Length ) then         mov( [ebx+ecx], al );      else         raise( ex.StringIndexError );      endif; 

Another way to check to see if the string index is within bounds is to use the 80x86 bound instruction.




The Art of Assembly Language
The Art of Assembly Language
ISBN: 1593272073
EAN: 2147483647
Year: 2005
Pages: 246
Authors: Randall Hyde

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net