6.2 Railroad Diagrams

     

6.2 Railroad Diagrams

In the film Planes, Trains, and Automobiles , characters played by Steve Martin and John Candy are faced with one improbable obstacle after another as they struggle to arrive home in time for the Thanksgiving holiday. Having compared this chapter with a sea voyage, it's reminiscent of that film to consider yet another mode of transportation, railways, as a means of understanding the SELinux policy language.

However, unlike many of the decisions of the film characters, my decision to introduce railroad diagrams is not capricious. Such diagrams were used in the 1970s by famous computer scientist Niklaus Wirth to develop and explain Pascal, one of the most successful programming languages. Since then, they've been used to explain many other programming languages. Although they can be cumbersome to create, they're quick to learn as well as easy to read and understand, so they're just about ideal as a means of explanation. Let's further mix our metaphors by diving into an exposition of railroad diagrams.

6.2.1 What Railroad Diagrams Do

Railroad diagrams are also known as syntax diagrams or syntax charts . They present the grammar of a formal language, such as one used for programming. However, formal languages also underlie the files used to configure systems and applications, such as the files that specify the SELinux security policy, so these diagrams are well suited to our immediate purpose.

Railroad diagrams specify two kinds of symbols:


Literal

A literal is a symbol that consists of one or more specific characters. Literals are generally punctuation marks, operators, or keywords of some sort .


Replaceable text

Replaceable text consists of text that has variable content.

These definitions will become clearer in the context of several small examples given in the following section.

6.2.2 How Railroad Diagrams Work

Figure 6-1 shows a railroad diagram that defines a literal representing the letter "a."

Figure 6-1. The letter "a"
figs/selx_0601.gif

The diagram contains two parts :


Line

The line guides you in reading the railroad diagram. If you're disappointed that the line doesn't more closely resemble a railroad track, I apologize. But, it's customary to draw the line in the simple fashion shown in the figure. You read the railroad diagram by following the line from left to right.


Oval

The oval represents a literal ”that is, a specific character, namely the letter "a."

Literal symbols (text that appears in the file exactly as shown in the diagram) appear in lightly shaded boxes, while replaceable symbols (which should be replaced with appropriate values by the administrator) appear in darkly shaded boxes.

One way to use a railroad diagram is as a means of parsing sentences following the grammar represented by the diagram. To do so, follow the line from left to right and attempt to match each symbol you encounter with a corresponding token in the sentence. If you can do so, the sentence is grammatical ; otherwise it's not.

The railroad diagram for the letter "a" is trivially simple, and therefore it is not much fun or valuable for practice. So, let's consider Figure 6-2, which presents a somewhat more sophisticated sentence, one that has the same form as an SELinux security policy attribute declaration. This sentence consists of three components that must appear in the indicated order:

  • A literal representing the keyword attribute

  • Replaceable text known as an id

  • A literal representing a semicolon

Figure 6-2. Attribute declaration
figs/selx_0602.gif

The railroad diagram merely references ”but doesn't define ”the replaceable text id , which would be defined by another diagram. I'll present a diagram defining id shortly. For now, let's simply understand id as representing an identifier of the sort used in many programming languages, consisting of a letter followed by zero or more letters or digits.

Given our understanding of the replaceable text id , our railroad diagram tells us that sentences such as the following are grammatical attribute declarations:

 attribute x; attribute xyz; attribute xyz123; 

Similarly, our railroad diagram tells us that sentences such as the following are not grammatical attribute declarations:

 attribute x        # lacks final semicolon attrib x;          # abbreviates required literal "attribute" attribute 123;     # contains integer rather than id 

Let's now consider a somewhat more complex railroad diagram, shown in Figure 6-3, which represents a digit. Notice how multiple tracks branch off the main line, so that you can completely traverse the railroad diagram by matching any of the ten literals representing a digit.

Figure 6-3. Digit
figs/selx_0603.gif

If you're familiar with regular expressions, you may realize that the syntax represented by Figure 6-3 could easily be represented by the regular expression:

 [0123456789] 

or, more concisely:

 [0-9] 

Let's now consider a set of three railroad diagrams that, together with Figure 6-3, define the composition of a signed integer. Figure 6-4 tells us that a signed integer consists of a sign, consisting of the literal + or - , followed by replaceable text named Unsigned_Integer , which obviously represents an unsigned integer.

Figure 6-4. Signed_Integer
figs/selx_0604.gif

Figure 6-5 defines Unsigned_integer in terms of Digit , the replaceable text defined in Figure 6-3. In Figure 6-5, notice the track that leads from the right side of the second instance of Digit to the left side of the same instance. This track makes it possible to include multiple occurrences of the second instance of Digit . The railroad diagram tells us that an unsigned integer consists of a digit, followed by zero or more digits. Put more plainly, it tells us that an unsigned integer consists of one or more digits.

Figure 6-5. Unsigned_Integer
figs/selx_0605.gif

You might represent the syntax shown in Figures Figure 6-4 and Figure 6-5 by the regular expression:

 [+-]\d\d* 

or the equivalent:

 [+-]\d+ 

Figure 6-6 puts together the definitions of Signed_Integer and Unsigned_Integer , telling us that an Integer consists of either a Signed_Integer or an Unsigned_Integer .

Figure 6-6. Integer
figs/selx_0606.gif

By now, you've seen everything necessary to understand the railroad diagrams we'll use to describe the SELinux security policy language. But, let's consider a couple additional railroad diagrams for good measure. Figure 6-7 shows another way of defining an Integer , one that consists of only a single railroad diagram. I suggest that you study the figure and convince yourself that the definition it offers is indeed equivalent to the previous definition that required three distinct railroad diagrams. Of course, like the three-diagram definition, the newer one references the diagram defining a Digit . So, it's not fully self-contained. But, including a definition of Digit rather than a reference to Digit in the newer definition would hopelessly clutter the definition.

Figure 6-7. Another way of specifying an Integer
figs/selx_0607.gif

Finally, let's consider one more railroad diagram, given in Figure 6-8. This railroad diagram defines the composition of an Identifier , which consists of a letter, followed by zero or more digits, letters, or underscores, which can be freely intermingled. I trust that it's evident that the railroad is much faster and easier to read and understand than the equivalent English sentence.

Figure 6-8. Identifier
figs/selx_0608.gif

I also presume you noticed that the replaceable text Letter , which is used in the railroad diagram, is defined neither by the diagram nor earlier in this chapter. I could give a railroad diagram defining this replaceable text, but it would be rather badly cluttered, since a letter can be any one of 52 literals: the lowercase and uppercase letters of the Roman alphabet.



SELinux. NSA's Open Source Security Enhanced Linux
Selinux: NSAs Open Source Security Enhanced Linux
ISBN: 0596007167
EAN: 2147483647
Year: 2003
Pages: 100
Authors: Bill McCarty

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net