In the early days of computers there were different types of data-processing equipment, and there was no common method of representing text. To alleviate this problem, the American National Standards Institute (ANSI) proposed the American Standard Code for Information Interchange (ASCII) in 1963. The standard was finalized in 1968 as a mapping of 127 characters , numbers, punctuation, and control codes to the numbers from 0 to 127 (see Table 3 ). The computer-minded reader may notice that this requires 7 bits and does not use an entire byte.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
1 | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
2 | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | / | |
3 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4 | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5 | P | Q | R | S | T | U | V | W | X | Y | z | [ | \ | ] | ^ | |
6 | ` | a | b | c | d | e | f | g | h | i | j | k | I | m | n | o |
7 | p | q | r | s | t | u | v | w | x | y | z | { |
| } | ~ | DEL |
Table 3 lists the original 127 ASCII characters. The top row and the left column are used to identify the hexadecimal ASCII value. For example, the capital letter A has an ASCII value of 41 in hexadecimal format, and z has an ASCII value of 5A. If more than one letter occupies a box, that value represents a special command character (see Table 4 ). Some of these special commands are designed for communications, some for file formats, and some are even available on the keyboard.
Hex | DEC | Symbol | Description |
---|---|---|---|
00 |
| NUL | Null, usually signifying nothing |
01 | 1 | SOH | Start of heading |
02 | 2 | STX | Start of text |
03 | 3 | ETX | End of text |
04 | 4 | EOT | End of transmission-not the same as ETB |
05 | 5 | ENQ | Enquiry |
06 | 6 | ACK | Acknowledge -I am here or data successfully received |
07 | 7 | BEL | Bell-Causes teletype machines and many terminals to ring a bell |
08 | 8 | BS | Backspace-Moves the cursor or print head backward (left) one space |
09 | 9 | TAB | Horizontal tab-Moves the cursor (or print head) right to the next tab stop |
OA | 10 | LF | Line feed or new line-Moves the cursor (or print head) to a new line |
OB | 11 | VT | Vertical tab |
OC | 12 | FF | Form feed-Advances paper to the top of the next page |
OD | 13 | CR | Carriage return-Moves the cursor (or print head) to the left margin |
OE | 14 | SO | Shift out-Switches the output device to an alternate character set |
OF | 15 | SI | Shift in-Switches the output device back to the default character set |
10 | 16 | DLE | Data link escape |
11 | 17 | DC1 | Device control 1 |
12 | 18 | DC2 | Device control 2 |
13 | 19 | DC3 | Device control 3 |
14 | 20 | DC4 | Device control 4 |
15 | 21 | NAK | Negative acknowledge |
16 | 22 | SYN | Synchronous idle |
17 | 23 | ETB | End of transmission block-not the same as EOT |
18 | 24 | CAN | Cancel |
19 | 25 | EM | End of medium |
1A | 26 | SUB | Substitute |
1B | 27 | ESC | Escape-This is the Esc key on your keyboard |
1C | 28 | FS | File separator |
1D | 29 | GS | Group separator |
1E | 30 | RS | Record separator |
1F | 31 | US | Unit separator |
7F | 127 | DEL | Delete-This is the Del key on your keyboard |
For most computers, the smallest easily stored and retrieved piece of data is a byte, which is composed of 8 bits. The characters in Table 3 require only 7 bits. To avoid wasting space, the Extended ASCII characters were introduced; these used the numbers 128 through 255. Although these characters introduce special, mathematical, graphic, and foreign characters, it just wasn't enough to satisfy international use. Around 1986, Xerox started working to extend the character set to work with Asian characters. This work eventually led to the current Unicode set, which uses 16-bit integers and allows for 65,536 unique characters.
OOo stores characters as 16-bit unsigned integer Unicode values. The ASC and CHR functions convert between the integer value and the character value, for example, between 65 and A. Use the ASC function to determine the numerical ASCII value of the first character in a string. The return value is a 16-bit integer allowing for Unicode values. Only the first character in the string is used; the rest of the characters are ignored. A run-time error occurs if the string has zero length. This is essentially the inverse of the CHR$ function, which converts the number back into a character.
Compatibility | The CHR function is frequently written as CHR$. In Visual Basic, CHR$ returns a string and can't handle null input values, and CHR returns a variant that's able to accept and propagate null values. In OOo Basic, they are the same; they both return strings and they both generate a run-time error with a null input value. |
Use the CHR function to convert a 16-bit ASCII value to the character that it represents. This is useful when you want to insert special characters into a string. For example, CHR(10) is the new-line character. The CHR function is the inverse of the ASC function. Although the ASC function returns the Unicode numbers, these numbers are frequently generically referred to as "the ASCII value." Strictly speaking, this is incorrect, but it's a widely used slang expression. The numbers correspond directly to the ASCII values for the numbers 0 through 255, and having used the terminology for years , programmers aren't likely to stop. So, when you see the term "ASCII value" in this book, think "Unicode value."
Print CHR$(65) 'A Print ASC("Andrew") '65 s = "1" & CHR$(10) & "2" 'New line between 1 and 2
Tip | Use the MsgBox statement to print strings that contain CHR$(10) or CHR$(13)-they both cause OOo Basic to print a new line. The Print statement displays a new dialog for each new line. MsgBox, however, properly displays new lines in a single dialog. |
While attempting to decipher the internal functions of OpenOffice.org, I frequently find strings that contain characters that aren't immediately visible, such as trailing spaces, new lines, and carriage returns. Converting the string to a sequence of ASCII characters simplifies the process of recognizing the true contents of the string. See Listing 1 and Figure 1 .
Sub ExampleStringToASCII Dim s As String s = "AB"""" """" BA" MsgBox s & CHR$(10) & StringToASCII(s), 0, "String To ASCII" End Sub Function StringToASCII(sInput$) As String Dim s As String Dim i As Integer For i = 1 To Len(sInput$) s = s & CStr(ASC(Mid(sInput$, i, 1))) & " " Next StringToASCII = s End Function
On more than one occasion, I needed to know exactly how OOo stored data in a text document. One common example is trying to manipulate new lines and new paragraphs in a manner not easily supported by regular expressions. The subroutine in Listing 2 displays the currently selected text as a string of ASCII values. The important thing to learn in this chapter is how to view the ASCII values associated with the text. This will show the characters used between paragraphs, for example. The methods to properly retrieve and manipulate selected text are covered later.
Sub SelectedTextAsASCII() Dim vSelections 'Multiple disjointedselections Dim vSel 'A single selection Dim vCursor 'OOo document cursor Dim i As Integer 'Index variable Dim s As String 'Temporary utility string variable Dim bIsSelected As Boolean 'Is any text selected? bIsSelected = True 'Assume that text is selected ' The current selection in the current controller. 'If there is no current controller, it returns NULL. 'Thiscomponent refers to the current document vSelections = Thiscomponent.getCurrentSelection() If IsNull(vSelections) OR IsEmpty(vSelections) Then bIsSelected = False Elself vSelections.getCount() = 0 Then bIsSelected = False End If If NOT bIsSelected Then 'If nothing is selected then say so Print "Nothing is selected" 'and then exit the subroutine Exit Sub End If 'The selections are numbered from zero 'Print the ASCII values of each For i = 0 To vSelections.getCount() - 1 vSel = vSelections.getByIndex(i) vCursor = ThisComponent.Text.CreateTextCursorByRange(vSel) s = vCursor.getString() If Len(s) > 0 Then MsgBox StringToASCII(vCursor.getString()), 0, "ASCII of Selection " & i ElseIf vSelections.getCount() = 1 Then Print "Nothing is selected" End If Next End Sub