11.3 The mechanics of mixing text of different writing directions

Evolving rendering needs are being documented by Unicode and are influenced by XSL-FO.

  • The rendering algorithm is governed primarily by the Unicode Bidirectional Algorithm:

    • http://www.unicode.org/unicode/ reports /tr9

  • Nuances are described by XSL-FO 1.0 Recommendation (Unicode BIDI Algorithm, Section 5.8) in conjunction with the CSS definition.

    • XSL-FO slightly modifies the CSS definition, e.g. mandating that an unspecified direction property is assumed to be that of the writing-mode in effect.

  • Here, we do not attempt to repeat the detailed algorithms described in the above documents, but only to give a general overview.

There are three categories of direction strength for Unicode characters .

  • Strong characters are from language groups with well defined left-to-right or right-to-left character-progression directions.

    • This includes "marks" that are non-spacing invisible characters with strong direction

      • ‎ left-to-right mark (LRM),

      • ‏ right-to-left mark (RLM).

    • This includes directional controls that are interpreted by the rendering algorithm

      • ‪ left-to-right embed begin (LRE),

      • ‫ right-to-left embed begin (RLE),

      • ‭ left-to-right override begin (LRO),

      • ‮ right-to-left override begin (RLO).

  • Weak characters are digits, currency symbols, and some punctuation characters.

    • This includes mirroring characters whose presentation on the canvas may be different from their representation in the data.

      • This includes "(", ")", "[", "]", "<", ">", "{", "}", " «", " »", etc.

      • A mirrored character is rendered in the writing direction of adjacent text, which may require it to be flipped by the formatter.

      • The stylesheet writer doesn't have any responsibility for doing the flipping.

    • This includes &#x202c; Pop Directional Formatting (PDF) for embedding levels.

  • Neutral characters are white space characters and some separator characters.

The embedding algorithm is based on isolating groups of sequences in "embedding levels."

  • An embedded sequence sets a progression direction for the members of the sequence.

    • Members are characters or other embedded sequences.

  • The stylesheet writer may need to introduce embedding levels to keep a sequence of characters together.

    • The stylesheet would need to recognize or predict the semantic affinity of the group to know where to embed a new level.

  • Using bidi-override will create an embedding level.

    • The progression direction of the level is either the specified direction , or the current writing-mode direction if not specified.

    • Using a unicode-bidi value of " embed " doesn't change the inherent writing direction of the individual characters in the embedding level.

    • Using a unicode-bidi value of " bidi-override " forces the characters to render in the direction of the embedding level, ignoring the inherent properties of the characters.

      • All characters are assigned the overriding direction as if they were strongly directed; their original nature is forgotten.

  • Embedding levels are indicated in the resulting stream using LRE, RLE, LRO, RLO, and PDF Unicode characters.

The resulting stream of characters is grouped according to the Unicode directionality controls.

  • The grouping of characters crosses boundaries of embedding levels.

  • The act of grouping weak characters with strong characters gives direction to the weak characters.

    • Weak characters are influenced by their proximity to strong characters so they become strongly directed themselves .

    • Weak characters between two strong characters of the same direction adopt that direction.

    • Weak characters preceding a strong character without white space interruption adopt the strong character's direction.

    • Weak characters following a strong character and any intervening white space adopt the strong character's direction.

  • The characters are rendered after all of the characters have been assigned a direction.

Examples in this book include sequences of right-to-left language text, as defined in Example 11-1.

  • &hebrew-test; is "Hebrew test" in Hebrew;

  • & arabic -test; is "Arabic test" in Arabic.

Example 11-1 Sample Unicode sequences of right-to-left language text
 Line 01 <!ENTITY hebrew-test  "&#x05D1;&#x05D3;&#x05D9;&#x05E7;&#x05D4;      02                        &#x05E2;&#x05D1;&#x05E8;&#x05D9;&#x05EA;">      03 <!ENTITY arabic-test1 "&#x0625;&#x062e;&#x062a;&#x0628;&#x0627;">      04 <!ENTITY arabic-test2 "&#x0631; &#x0639;&#x0631;&#x0628;&#x064a;">      05 <!ENTITY arabic-test  "&arabic-test1;&arabic-test2;"> 

Consider a detailed example of mixing sequences of different directions in .

  • Lines in the test are grouped differently to illustrate how groups are ordered in the writing direction for rendering before the characters found in the group are rendered;

    • test "12" is left-to-right with embedded sequences of right-to-left text;

    • test "23" is right-to-left with the content identical to test "12";

    • test "34" is almost identical to test "23" except for a space introduced after "89";

    • test "45" is right-to-left but the language text inside is in groups without overriding direction;

    • test "56" is left-to-right but the language text inside is in groups that override direction;

    • test "67" is right-to-left but the language text inside is in groups that override direction.

    • Note that where the direction isn't specified, it is inferred by the writing mode;

      • this is different from CSS that assumes left-to-right when not specified.

  • Weak punctuation characters separate the strong script characters.

    • The first three tests illustrate the differences in assignment of direction to weak characters.

    • Note how the introduction of a space at the end of the "34" test changes the direction assignment to the "89" characters, compared to the "89" characters in the "23" test.

  • Embedding groups arrange the groups of left-to-right sequences;

    • in test "34", the English text is shown to the left of the French text;

    • in test "45", the French text is shown to the left of the English text.

  • Overriding the direction results in an improper presentation of language text;

    • in test "56", the Hebrew and Arabic sequences are inappropriately presented;

    • in test "67", the English and French sequences are inappropriately presented.

Example 11-2 Controlling bidirectionality using grouping
 Line 01 <block-container>      02   <block>12 - English Test 13 , Test Franais 14      03   + &hebrew-test; 15 = &arabic-test; 16 / 89end</block>      04 </block-container>      05 <block-container writing-mode="rl-tb">      06   <block>23 - English Test 13 , Test Franais 14       07   + &hebrew-test; 15 = &arabic-test; 16 / 89end</block>      08 </block-container>      09 <block-container writing-mode="rl-tb">      10   <block>34 - English Test 13 , Test Franais 14       11   + &hebrew-test; 15 = &arabic-test; 16 / 89 end</block>      12 </block-container>      13 <block-container writing-mode="rl-tb">      14   <block>45 - <bidi-override unicode-bidi="embed"      15                              >English Test 13</bidi-override>      16             , <bidi-override unicode-bidi="embed"      17                              >Test Franais 14</bidi-override>       18             + <bidi-override unicode-bidi="embed"      19                              >&hebrew-test; 15</bidi-override>      20             = <bidi-override unicode-bidi="embed"      21                              >&arabic-test; 16</bidi-override>      22             / 89 end</block></block-container>      23 <block-container>      24   <block>56 - <bidi-override unicode-bidi="bidi-override"      25                              >English Test 13</bidi-override>      26             , <bidi-override unicode-bidi="bidi-override"      27                              >Test Franais 14</bidi-override>      28             + <bidi-override unicode-bidi="bidi-override"      29                              >&hebrew-test; 15</bidi-override>      30             = <bidi-override unicode-bidi="bidi-override"      31                              >&arabic-test; 16</bidi-override>      32             / 89 end</block></block-container>      33 <block-container writing-mode="rl-tb">      34   <block>67 - <bidi-override unicode-bidi="bidi-override"      35                              >English Test 13</bidi-override>      36             , <bidi-override unicode-bidi="bidi-override"      37                              >Test Franais 14</bidi-override>      38             + <bidi-override unicode-bidi="bidi-override"      39                              >&hebrew-test; 15</bidi-override>      40             = <bidi-override unicode-bidi="bidi-override"      41                              >&arabic-test; 16</bidi-override>      42             / 89 end</block></block-container> 

Figure 11-1 illustrates the on-screen interpretation of Example 11-2.

  • The line annotations are only for illustrative purposes; they visualise the groupings of characters and the character writing directions.

  • This test does not incorporate explicit use of Unicode directionality characters;

    • the use of bidi-override introduces these characters into the rendering stream.

  • Note how the grouping of characters for strength purposes goes over the bounds of embedded levels.

Figure 11-1. Example of bidirectionality


Definitive XSL-FO
Definitive XSL-FO
ISBN: 0131403745
EAN: 2147483647
Year: 2002
Pages: 99
Authors: G. Ken Holman

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net