ASCII and Unicode Values


In the early days of computers there were different types of data-processing equipment, and there was no common method of representing text. To alleviate this problem, the American National Standards Institute (ANSI) proposed the American Standard Code for Information Interchange (ASCII) in 1963. The standard was finalized in 1968 as a mapping of 127 characters , numbers, punctuation, and control codes to the numbers from 0 to 127 (see Table 3 ). The computer-minded reader may notice that this requires 7 bits and does not use an entire byte.

Table 3: The original 127 ASCII characters.
 

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

NUL

SOH

STX

ETX

EOT

ENQ

ACK

BEL

BS

HT

LF

VT

FF

CR

SO

SI

1

DLE

DC1

DC2

DC3

DC4

NAK

SYN

ETB

CAN

EM

SUB

ESC

FS

GS

RS

US

2

SP

!

"

#

$

%

&

'

(

)

*

+

,

-

 

/

3

1

2

3

4

5

6

7

8

9

:

;

<

=

>

?

4

@

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

5

P

Q

R

S

T

U

V

W

X

Y

z

[

\

]

^

 

6

`

a

b

c

d

e

f

g

h

i

j

k

I

m

n

o

7

p

q

r

s

t

u

v

w

x

y

z

{

}

~

DEL

Table 3 lists the original 127 ASCII characters. The top row and the left column are used to identify the hexadecimal ASCII value. For example, the capital letter A has an ASCII value of 41 in hexadecimal format, and z has an ASCII value of 5A. If more than one letter occupies a box, that value represents a special command character (see Table 4 ). Some of these special commands are designed for communications, some for file formats, and some are even available on the keyboard.

Table 4: Non-printable ASCII characters.

Hex

DEC

Symbol

Description

00

NUL

Null, usually signifying nothing

01

1

SOH

Start of heading

02

2

STX

Start of text

03

3

ETX

End of text

04

4

EOT

End of transmission-not the same as ETB

05

5

ENQ

Enquiry

06

6

ACK

Acknowledge -I am here or data successfully received

07

7

BEL

Bell-Causes teletype machines and many terminals to ring a bell

08

8

BS

Backspace-Moves the cursor or print head backward (left) one space

09

9

TAB

Horizontal tab-Moves the cursor (or print head) right to the next tab stop

OA

10

LF

Line feed or new line-Moves the cursor (or print head) to a new line

OB

11

VT

Vertical tab

OC

12

FF

Form feed-Advances paper to the top of the next page

OD

13

CR

Carriage return-Moves the cursor (or print head) to the left margin

OE

14

SO

Shift out-Switches the output device to an alternate character set

OF

15

SI

Shift in-Switches the output device back to the default character set

10

16

DLE

Data link escape

11

17

DC1

Device control 1

12

18

DC2

Device control 2

13

19

DC3

Device control 3

14

20

DC4

Device control 4

15

21

NAK

Negative acknowledge

16

22

SYN

Synchronous idle

17

23

ETB

End of transmission block-not the same as EOT

18

24

CAN

Cancel

19

25

EM

End of medium

1A

26

SUB

Substitute

1B

27

ESC

Escape-This is the Esc key on your keyboard

1C

28

FS

File separator

1D

29

GS

Group separator

1E

30

RS

Record separator

1F

31

US

Unit separator

7F

127

DEL

Delete-This is the Del key on your keyboard

For most computers, the smallest easily stored and retrieved piece of data is a byte, which is composed of 8 bits. The characters in Table 3 require only 7 bits. To avoid wasting space, the Extended ASCII characters were introduced; these used the numbers 128 through 255. Although these characters introduce special, mathematical, graphic, and foreign characters, it just wasn't enough to satisfy international use. Around 1986, Xerox started working to extend the character set to work with Asian characters. This work eventually led to the current Unicode set, which uses 16-bit integers and allows for 65,536 unique characters.

OOo stores characters as 16-bit unsigned integer Unicode values. The ASC and CHR functions convert between the integer value and the character value, for example, between 65 and A. Use the ASC function to determine the numerical ASCII value of the first character in a string. The return value is a 16-bit integer allowing for Unicode values. Only the first character in the string is used; the rest of the characters are ignored. A run-time error occurs if the string has zero length. This is essentially the inverse of the CHR$ function, which converts the number back into a character.

Compatibility  

The CHR function is frequently written as CHR$. In Visual Basic, CHR$ returns a string and can't handle null input values, and CHR returns a variant that's able to accept and propagate null values. In OOo Basic, they are the same; they both return strings and they both generate a run-time error with a null input value.

Use the CHR function to convert a 16-bit ASCII value to the character that it represents. This is useful when you want to insert special characters into a string. For example, CHR(10) is the new-line character. The CHR function is the inverse of the ASC function. Although the ASC function returns the Unicode numbers, these numbers are frequently generically referred to as "the ASCII value." Strictly speaking, this is incorrect, but it's a widely used slang expression. The numbers correspond directly to the ASCII values for the numbers 0 through 255, and having used the terminology for years , programmers aren't likely to stop. So, when you see the term "ASCII value" in this book, think "Unicode value."

 Print CHR$(65)            'A Print ASC("Andrew")       '65 s = "1" & CHR$(10) & "2"  'New line between 1 and 2 
Tip  

Use the MsgBox statement to print strings that contain CHR$(10) or CHR$(13)-they both cause OOo Basic to print a new line. The Print statement displays a new dialog for each new line. MsgBox, however, properly displays new lines in a single dialog.

While attempting to decipher the internal functions of OpenOffice.org, I frequently find strings that contain characters that aren't immediately visible, such as trailing spaces, new lines, and carriage returns. Converting the string to a sequence of ASCII characters simplifies the process of recognizing the true contents of the string. See Listing 1 and Figure 1 .


Figure 1: A string followed by its corresponding ASCII values- A=65, B=66, C=34, and so on ...
Listing 1: StringToASCII is found in the String module in this chapter's source code files as SC06.sxw.
start example
 Sub ExampleStringToASCII   Dim s As String   s = "AB""""  """"  BA"   MsgBox s & CHR$(10) & StringToASCII(s), 0, "String To ASCII" End Sub Function StringToASCII(sInput$) As String   Dim s As String   Dim i As Integer   For i = 1 To Len(sInput$)     s = s & CStr(ASC(Mid(sInput$, i, 1))) & " "   Next   StringToASCII = s End Function 
end example
 

On more than one occasion, I needed to know exactly how OOo stored data in a text document. One common example is trying to manipulate new lines and new paragraphs in a manner not easily supported by regular expressions. The subroutine in Listing 2 displays the currently selected text as a string of ASCII values. The important thing to learn in this chapter is how to view the ASCII values associated with the text. This will show the characters used between paragraphs, for example. The methods to properly retrieve and manipulate selected text are covered later.

Listing 2: SelectedTextAsASCII is found in the String module in this chapter's source code files as SC06.sxw.
start example
 Sub SelectedTextAsASCII()   Dim vSelections                        'Multiple disjointedselections   Dim vSel                               'A single selection   Dim vCursor                            'OOo document cursor   Dim i As Integer                       'Index variable   Dim s As String                        'Temporary utility string variable   Dim bIsSelected As Boolean             'Is any text selected?   bIsSelected = True                     'Assume that text is selected   ' The current selection in the current controller.   'If there is no current controller, it returns NULL.   'Thiscomponent refers to the current document   vSelections = Thiscomponent.getCurrentSelection()   If IsNull(vSelections) OR IsEmpty(vSelections) Then     bIsSelected = False   Elself vSelections.getCount() = 0 Then     bIsSelected = False   End If   If NOT bIsSelected Then          'If nothing is selected then say so     Print "Nothing is selected"    'and then exit the subroutine     Exit Sub   End If   'The selections are numbered from zero   'Print the ASCII values of each   For i = 0 To vSelections.getCount() - 1     vSel = vSelections.getByIndex(i)     vCursor = ThisComponent.Text.CreateTextCursorByRange(vSel)     s = vCursor.getString()     If Len(s) > 0 Then       MsgBox StringToASCII(vCursor.getString()), 0, "ASCII of Selection " & i     ElseIf vSelections.getCount() = 1 Then       Print "Nothing is selected"     End If   Next End Sub 
end example
 



OpenOffice.org Macros Explained
OpenOffice.org Macros Explained
ISBN: 1930919514
EAN: 2147483647
Year: 2004
Pages: 203

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net