Section 2.4. Program Commands


2.4. Program Commands

We often need program-specific ways of entering characters from a keyboard, either because there is no key for a character we need or there is but it does not work. The program involved might be part of system software, or it might be an application program. We describe here some typical cases.

2.4.1. Copying via the Clipboard

In typical computer systems, you can copy data from one program to another through an internal storage area called the clipboard . On Windows, you can usually highlight text with the mouse or select a piece of text otherwise, and then press Ctrl-C to copy, click on a location in another window, and press Ctrl-V to paste a copy of the text there. This also works inside a program of course, so you can use it to create copies of a character or a string.

This feature is well known by most users and often very convenient, though it cannot be the primary method of writing text. You can however copy characters from web pages or from text documents specifically designed for use as "cliptext ."

Often this technique has the property of copying text formatting along with the text. If you copy bold 16-point Verdana text from Excel to Word, you get 16-point Verdana text, not text in the normal font as defined by your Word settings or template. This might be desirable, but more often, it is a problem. Moreover, constructs like hypertext links may get copied along with the text. To make sure that only the plain text is inserted, you can first paste the text in Notepad, select it again there, press Ctrl-C, and paste in the desired destination.

2.4.2. Menu Commands

Programs may have command menus for inserting characters, so that characters are identified by some names or glyphs. At the simplest, you just select a command and a subcommand from a menu. Usually it is more complicated, to allow the insertion of more characters that can conveniently be included into a command menu.

2.4.2.1. Insertion menu in Thunderbird

In Mozilla Thunderbird, when composing an email message, you can select Insert Characters and symbols. This opens a small window, as in Figure 2-7. There, you can select a class of characters by clicking on one of the radio buttons. This affects the drop-down menu under the buttons. For example, when Common Symbols is selected, the Character drop-down menu contains a collection of Latin 1 special characters (other than letters), such as © and ±.

Such an input method is intuitively easy and can be found and used by a user even without any documentation. On the other hand, it is rather clumsy, since any insertion requires several steps.

2.4.2.2. Symbol (character) insertion menu in MS Word

In MS Word, you can use the command Insert Symbol to invoke an auxiliary window, which has two modes of operation. In the default mode, Symbols, you can select a character

Figure 2-7. Character insertion window in Thunderbird


Figure 2-8. Special Characters insertion window in MS Word


from a table, as explained in the section "Character Maps" later in this chapter. You enter the second mode by clicking on Special Characters. There, you can pick up a character from a short list, as shown in Figure 2-8. The list also contains information about shortcut keys for the characters, so it can be used to check such things. Among the characters that have no default shortcut keys, the fixed-width spaces (em space, en space, and 1/4 em space = four-per-em space) work with a few fonts only.

Figure 2-9. Formatting characters and markers in MS Word


Some of the symbols that you can add via the Special Characters menu, (or corresponding shortcuts) are really just internal markers used by MS Word. Either they do not correspond to any Unicode character or they involve using Unicode characters in an abnormal way. For example, Optional Hyphen which is invisible but indicates an allowed hyphenation point in a word, is not what you might expect, the soft hyphen (U+00AD) is a marker recognize by MS Word. We will return to this in Chapter 8.

2.4.2.3. The Show Formatting (Show ¶) tool

In MS Word, there is a special mode of viewing a document on screen so that some formatting characters and markers appear as visible symbols, as formatting markse.g., paragraph breaks as ¶ symbols. The mode works independently on the input method that has been used. It is useful, for example, when checking which spaces are no-break spaces, since they will appear as small rings (resembling the degree sign ° but larger). The formatting marks do not appear in printed copies.

You can select a view with formatting marks by clicking the ¶ icon in the toolbar. Clicking the icon again changes the view to normal. If this icon is not present, you can use a menu command like View Show Paragraph Marks instead.

The detailed look varies by the version of MS Word, but it resembles the one shown in Figure 2-9. The columns in it indicate the name of the character or other construct; its Unicode code number, if any; its display in "Show Formatting" mode; an example of such appearance in text context; the same text in normal view; and a method for typing in the construct. "Insert menu" refers to the Special Characters insertion menu. Table cells are created using special commands in MS Word. The issue is mentioned here because making paragraph marks visible also makes a ¤ symbol appear at the end of the content of each table cell.

Sometimes you might wish to view some of the formatting indicators but not all. This can be achieved by choosing Tools Options, and then selecting the View pane, shown in

Figure 2-10. Setting the display of formatting marks in MS Word


Figure 2-10. Some settings under "Formatting marks" affect several marks; e.g., "Optional breaks" makes line break prohibitions visible, too. Clicking the ¶ icon in the toolbar corresponds to checking "All" in this dialog.

2.4.3. Methods Using the Alt Key on Windows

There are several ways to type a character on Windows if you know the code number. Not all of the ways work in all contexts, and they differ from each other so that they are easily confused with each other. Table 2-2 summarizes the methods, and then each method is explained in detail. The example characters, for which key sequences are given in columns "Ex. 1" and "Ex. 2," are the copyright sign © (U+00A9) and the ohm sign (U+2126). The expression "New Windows software refers to programs such as WordPad and MS Word on Windows XP and newer.

Table 2-2. General methods for character input on Windows

Method

Ex. 1

Ex. 2

Description

Applicability

Alt-0n

Alt-0169

 

Uses decimal number

Windows in general

Alt-n (n255)

Alt-184

 

Code page dependent

Windows in general

Alt-n(n>255)

 

Alt-8486

Unicode, decimal

New Windows software

n Alt-X

a9 Alt-X

2126 Alt-X

Unicode, hexadecimal

New Windows software

Alt-+n

Alt-+a9

Alt-+2126

Unicode, hexadecimal

New Windows (often)


2.4.3.1. The Alt-0n method

On Windows systems, you can (usuallysome application programs may override this) produce any character in the 8-bit Windows character set (such as Windows Latin 1) as follows:

  1. Press down the Alt key and keep it down. (Use the Alt key, not AltGr.)

  2. Using the separate numeric keypad (not the numbers above the letter keys!), type the number of the character in decimal and with a leading zero. You do not see anything happen on screen when you do this.

  3. Release the Alt key. The character now appears.

The code values for which this works are in the range 32255 (decimal). For instance, to produce the letter Ä (which has code 196 in decimal), you would hold Alt down, type 0196, and then release Alt. Upon releasing Alt, the character should appear on the screen.

In MS Word, the method works only if Num Lock is set (by pressing the Num Lock key in the numeric keypad).

Portable computers often lack a numeric keypad. They usually have a key combination (explained in the manual) that makes some normal keys simulate a numeric keypad. Typically, the same combination turns the situation back to normal.

This method is often referred to as Alt-0nnn to emphasize that you normally type four digits starting with zero, but we use the shorter notation Alt-0n. It is quite possible to use less than four digits when the number is small; for example, Alt-092 produces a \. However, characters with such small code numbers can usually be typed more directly.

The codes are interpreted according to the Windows character code, which may vary by country and language version as well as keyboard settings. In the Western world, the code is normally windows-1252, also known as Windows Latin 1. This means, as will be explained in Chapter 3, that code numbers 32126 and 160255 (decimal) are the same as in Unicode. However, if you, for example, set your keyboard layout to Russian, the meanings change: they refer to windows-1251 (Windows Cyrillic). Then, for example, Alt-0169 still produces ©, since the copyright sign has the same position in windows-1251 as in windows-1252, but Alt-0233 produces and not é as with English keyboard settings.

2.4.3.2. The code pagespecific Alt-n method

If you use the method described in the previous section but omit the leading zeroi.e., use Alt-nthe effect is different. That way, you insert the character that occupies code position n in the DOS character code! More generally, the character inserted is the one in that position in the code page in use. Code pages have the same assignments for code numbers 30126 as Unicode but differ from Unicode and from other code pages in other positions.

Code pages will be discussed in Chapter 3. For a quick reference to the character assignments in code pages, see http://www.fileformat.info/info/charset/codepage.htm.

Briefly, a code page is an 8-bit encoding that is used in some contexts in Windows environmentsa holdover from DOS systems . You can find out your system code page number by giving the command chcp on the command prompt (DOS prompt). Normally, your computer uses the code page defined by the manufacturer according to the market area, called the OEM code page (OEM stands for original equipment manufacturer).

For example, Alt-196 might insert a graphic character, box drawings light horizontal (U+2500). To get the copyright sign, you would use Alt-184, if your systems current code page is 850, which is common in Western Europe. In that code page, the code of the copyright sign is 184 in decimal (B8 in hexadecimal). Code page 437, which is common in the U.S., does not contain the copyright sign at all. On the other hand, it contains some Greek letters and additional mathematical symbols, such as .

There are variations in the behavior of various Windows programs in this area. Using DOS codes and this input method is best avoided, although it would save a little typing. It is very easy to get confused with the methods and the numbers.

It may happen that if you type, for example, Alt-1 or Alt-3, you get graphic characters like ☺ and . This is because some code page versions have allocated graphic characters to code positions 031 (decimal), although these positions are normally reserved for control characters. Though occasionally handy, such methods cannot be relied on, since they depend on the code page, its version, and the program.

2.4.3.3. The Unicode-based Alt-n method

In some programs on modern Windows systems, you can use Alt-n for n > 255 to produce the Unicode character with code number n in decimal. Thus, the method is:

  1. Press and hold the Alt key.

  2. Type the decimal number n using the numeric keypad. Nothing visible happens yet.

  3. Release the Alt key. The character with Unicode number n now appears.

This works in programs such as WordPad and Word on Windows XP. In many other programse.g., in Notepad or in form field input in Internet Explorerthe method does not work. If you try it, the value n is mapped to a value in the range 0255 (using division by 256 and taking the remainder) and it has a code pagespecific effect as above.

Characters that belong to Windows Latin 1 but not to ISO Latin 1 thus have two alternative sequences. For example, the em dash, ', can be typed as Alt-0151 or as Alt-8211.

This method is relatively fast but requires you to type the decimal code number "in blind," i.e., without seeing what you have typed. The next method is different.

2.4.3.4. The Alt-X method

This method, too, works only in some programs on modern Windows systems. Like the Unicode-based Alt-n method, it uses so-called Uniscribe program code for handling the keyboard, and only a few programs use Uniscribe so far.

The method consists of the following:

  1. Type the hexadecimal Unicode code number of the character you want. You can use the normal keyboard. (You can alternatively use the numeric keypad for digits 09, if the Num Lock mode is set.)

  2. Press Alt-X, i.e., hold down the Alt button and press the "X" button. The number now turns to the character.

The method also works for any string of hexadecimal digits in a document, not just a string you have directly typed. If an existing document contains, say, the string 101, you can click on the position right after the last digit and press Alt-X. The string then turns to the character (U+0101). If the hexadecimal string is preceded by U+ (or u+), those characters, too, disappear in the process.

The method applies to the maximal sequence of hexadecimal digits before the point where Alt-X is typed. If you would like to write b, you cannot just type b101 Alt-X, since the letter "b is a hexadecimal digit, so you would get the character U+B101. Instead, you can type a space before the digits 101, apply Alt-X, and then remove the extra space.

The method works in the other direction, too: when the preceding character is not a hexadecimal digit, pressing Alt-X turns the character to its hexadecimal Unicode number. However, the effect is not always directly reversible. If you have typed "8-" (digit eight and minus sign) and then press Alt-X, you get 82212. Pressing Alt-X would turn this five-digit string to the character U+82212. If such a problem appears, insert a space temporarily.

Program-specific keyboard command assignments may mask out the possibility of using this method. It is therefore not a good idea to define a text-processing macro with an invocation that starts with Alt-X.

2.4.3.5. The Alt-+n method

This method is similar to the nAlt-X method in the sense that it uses hexadecimal Unicode numbers and works on modern Windows systems. However, this method has some specific features:

  • It works in most programs and contexts, including Notepad, form fields in web browsers, Unicode email, etc.

  • It has limitations due to Alt key assignments in programs.

  • It depends on system configuration, so it might not work by default.

To use this method, proceed as follows:

  1. Press and hold the Alt key.

  2. Press the + key in the numeric keypad. (Think of it as indicating that the following number is to be treated as hexadecimal.)

  3. Type the hexadecimal number n using either normal keys or (for digits 0 to 9) the numeric keypad. Nothing visible happens yet.

  4. Release the Alt key. The character with Unicode number n now appears.

If this does not work (and you are using a relatively modern Windows version, such as Windows XP), it is because your system has been configured not to use this input method. This can be changed through the Windows registry settings, using the registry editor (regedit). If you are not familiar with registry settings, try to find someone who knows them and can fix your settings. In HKEY_Current_User Control Panel Input Method, set EnableHexNumpad to 1 (one). If the variable does not exist, add it there and set its type to REG_SZ. Now you must reboot the system. This inconvenience is probably partly intentional: the method is still experimental and lacks support.

Using the Alt key together with normal keys (outside the numeric keypad) often conflicts with keyboard shortcuts in programs, such as Alt-F for opening a File menu. This may cause limitations for characters with letters in their hexadecimal code number.

2.4.4. Ctrl-Q and Other Methods in Emacs

In the Emacs editor, which is popular especially on Unix-type systems, you can produce any ISO Latin 1 character by typing first Ctrl-Q, and then the character's code as a three-digit octal (base 8) number. To produce Ä, you would type Ctrl-Q followed by the three digits 304 (and expect the Ä character to appear on screen). This method is often referred to as C-Q-nnn.

There are additional ways of entering many ISO Latin 1 characters in Emacs. You can for example use the M-x iso-accents-mode command (where M-x means meta-X, which can typically be produced by pressing first the Esc key, and then the X key). It sets Emacs to a mode of operation where several ASCII characters are converted to diacritic marks when typed before a letter. For example, typing 'e would produce e with an acute accent, é.



Unicode Explained
Unicode Explained
ISBN: 059610121X
EAN: 2147483647
Year: 2006
Pages: 139

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net