11.3 The mechanics of mixing text of different writing directions

Evolving rendering needs are being documented by Unicode and are influenced by XSL-FO.

The rendering algorithm is governed primarily by the Unicode Bidirectional Algorithm:
- http://www.unicode.org/unicode/ reports /tr9
Nuances are described by XSL-FO 1.0 Recommendation (Unicode BIDI Algorithm, Section 5.8) in conjunction with the CSS definition.
- XSL-FO slightly modifies the CSS definition, e.g. mandating that an unspecified direction property is assumed to be that of the writing-mode in effect.
Here, we do not attempt to repeat the detailed algorithms described in the above documents, but only to give a general overview.

There are three categories of direction strength for Unicode characters .

Strong characters are from language groups with well defined left-to-right or right-to-left character-progression directions.
- This includes "marks" that are non-spacing invisible characters with strong direction
  - ‎ left-to-right mark (LRM),
  - ‏ right-to-left mark (RLM).
- This includes directional controls that are interpreted by the rendering algorithm
  - ‪ left-to-right embed begin (LRE),
  - ‫ right-to-left embed begin (RLE),
  - ‭ left-to-right override begin (LRO),
  - ‮ right-to-left override begin (RLO).
Weak characters are digits, currency symbols, and some punctuation characters.
- This includes mirroring characters whose presentation on the canvas may be different from their representation in the data.
  - This includes "(", ")", "[", "]", "<", ">", "{", "}", " «", " »", etc.
  - A mirrored character is rendered in the writing direction of adjacent text, which may require it to be flipped by the formatter.
  - The stylesheet writer doesn't have any responsibility for doing the flipping.
- This includes ‬ Pop Directional Formatting (PDF) for embedding levels.
Neutral characters are white space characters and some separator characters.

The embedding algorithm is based on isolating groups of sequences in "embedding levels."

An embedded sequence sets a progression direction for the members of the sequence.
- Members are characters or other embedded sequences.
The stylesheet writer may need to introduce embedding levels to keep a sequence of characters together.
- The stylesheet would need to recognize or predict the semantic affinity of the group to know where to embed a new level.
Using bidi-override will create an embedding level.
- The progression direction of the level is either the specified direction , or the current writing-mode direction if not specified.
- Using a unicode-bidi value of " embed " doesn't change the inherent writing direction of the individual characters in the embedding level.
- Using a unicode-bidi value of " bidi-override " forces the characters to render in the direction of the embedding level, ignoring the inherent properties of the characters.
  - All characters are assigned the overriding direction as if they were strongly directed; their original nature is forgotten.
Embedding levels are indicated in the resulting stream using LRE, RLE, LRO, RLO, and PDF Unicode characters.

The resulting stream of characters is grouped according to the Unicode directionality controls.

The grouping of characters crosses boundaries of embedding levels.
The act of grouping weak characters with strong characters gives direction to the weak characters.
- Weak characters are influenced by their proximity to strong characters so they become strongly directed themselves .
- Weak characters between two strong characters of the same direction adopt that direction.
- Weak characters preceding a strong character without white space interruption adopt the strong character's direction.
- Weak characters following a strong character and any intervening white space adopt the strong character's direction.
The characters are rendered after all of the characters have been assigned a direction.

Examples in this book include sequences of right-to-left language text, as defined in Example 11-1.

&hebrew-test; is "Hebrew test" in Hebrew;
& arabic -test; is "Arabic test" in Arabic.

Example 11-1 Sample Unicode sequences of right-to-left language text

 Line 01 <!ENTITY hebrew-test  "&#x05D1;&#x05D3;&#x05D9;&#x05E7;&#x05D4;      02                        &#x05E2;&#x05D1;&#x05E8;&#x05D9;&#x05EA;">      03 <!ENTITY arabic-test1 "&#x0625;&#x062e;&#x062a;&#x0628;&#x0627;">      04 <!ENTITY arabic-test2 "&#x0631; &#x0639;&#x0631;&#x0628;&#x064a;">      05 <!ENTITY arabic-test  "&arabic-test1;&arabic-test2;">

Consider a detailed example of mixing sequences of different directions in .

Lines in the test are grouped differently to illustrate how groups are ordered in the writing direction for rendering before the characters found in the group are rendered;
- test "12" is left-to-right with embedded sequences of right-to-left text;
- test "23" is right-to-left with the content identical to test "12";
- test "34" is almost identical to test "23" except for a space introduced after "89";
- test "45" is right-to-left but the language text inside is in groups without overriding direction;
- test "56" is left-to-right but the language text inside is in groups that override direction;
- test "67" is right-to-left but the language text inside is in groups that override direction.
- Note that where the direction isn't specified, it is inferred by the writing mode;
  - this is different from CSS that assumes left-to-right when not specified.
Weak punctuation characters separate the strong script characters.
- The first three tests illustrate the differences in assignment of direction to weak characters.
- Note how the introduction of a space at the end of the "34" test changes the direction assignment to the "89" characters, compared to the "89" characters in the "23" test.
Embedding groups arrange the groups of left-to-right sequences;
- in test "34", the English text is shown to the left of the French text;
- in test "45", the French text is shown to the left of the English text.
Overriding the direction results in an improper presentation of language text;
- in test "56", the Hebrew and Arabic sequences are inappropriately presented;
- in test "67", the English and French sequences are inappropriately presented.

Example 11-2 Controlling bidirectionality using grouping

 Line 01 <block-container>      02   <block>12 - English Test 13 , Test Franais 14      03   + &hebrew-test; 15 = &arabic-test; 16 / 89end</block>      04 </block-container>      05 <block-container writing-mode="rl-tb">      06   <block>23 - English Test 13 , Test Franais 14       07   + &hebrew-test; 15 = &arabic-test; 16 / 89end</block>      08 </block-container>      09 <block-container writing-mode="rl-tb">      10   <block>34 - English Test 13 , Test Franais 14       11   + &hebrew-test; 15 = &arabic-test; 16 / 89 end</block>      12 </block-container>      13 <block-container writing-mode="rl-tb">      14   <block>45 - <bidi-override unicode-bidi="embed"      15                              >English Test 13</bidi-override>      16             , <bidi-override unicode-bidi="embed"      17                              >Test Franais 14</bidi-override>       18             + <bidi-override unicode-bidi="embed"      19                              >&hebrew-test; 15</bidi-override>      20             = <bidi-override unicode-bidi="embed"      21                              >&arabic-test; 16</bidi-override>      22             / 89 end</block></block-container>      23 <block-container>      24   <block>56 - <bidi-override unicode-bidi="bidi-override"      25                              >English Test 13</bidi-override>      26             , <bidi-override unicode-bidi="bidi-override"      27                              >Test Franais 14</bidi-override>      28             + <bidi-override unicode-bidi="bidi-override"      29                              >&hebrew-test; 15</bidi-override>      30             = <bidi-override unicode-bidi="bidi-override"      31                              >&arabic-test; 16</bidi-override>      32             / 89 end</block></block-container>      33 <block-container writing-mode="rl-tb">      34   <block>67 - <bidi-override unicode-bidi="bidi-override"      35                              >English Test 13</bidi-override>      36             , <bidi-override unicode-bidi="bidi-override"      37                              >Test Franais 14</bidi-override>      38             + <bidi-override unicode-bidi="bidi-override"      39                              >&hebrew-test; 15</bidi-override>      40             = <bidi-override unicode-bidi="bidi-override"      41                              >&arabic-test; 16</bidi-override>      42             / 89 end</block></block-container>

Figure 11-1 illustrates the on-screen interpretation of Example 11-2.

The line annotations are only for illustrative purposes; they visualise the groupings of characters and the character writing directions.
This test does not incorporate explicit use of Unicode directionality characters;
- the use of bidi-override introduces these characters into the rendering stream.
Note how the grouping of characters for strength purposes goes over the bounds of embedded levels.

Figure 11-1. Example of bidirectionality

graphics/11fig01.gif