Item 6. Name Elements with Camel Case

There are no standard naming conventions for XML. I've seen XML applications that use all capitals, all small letters , separate words with hyphens, separate words with underscores, and more. Without being too fanatical about it, I recommend camel case, usingInternalCapitalization InLieuOfWhiteSpaceLikeThis, simply because our eyes are better trained to follow it.

In the programming sections of Usenet, case conventions are second only to indentation as a source of pointless erudition and time-wasting flameage. There are many good naming and case conventions, most of which have nothing to strongly recommend them over any other. Most modern languages like Java, Delphi, and C# have tended to adopt one convention, even if they don't enforce it in the compiler, in order to facilitate the legibility of code between different people and groups. It doesn't matter which convention is picked as long as a single convention is chosen .

XML, unfortunately , does not have any recommended naming or case conventions for element and attribute names . Multiple conventions such as those listed below are used in practice.

XSLT uses all lower case with hyphens separating words, as in xsl:value-of , xsl:apply-templates , xsl:for-each , and xsl:attribute-set .
The W3C XML Schema Language uses camel case with an initial lowercase letter, as in xsd:complexType , xsd:simpleType , xsd:gMonthDay , and xsi:schemaLocation .
DocBook uses lower case exclusively and never separates the words, as in para , firstname , biblioentry , chapterinfo , methodsynopsis , and listitem .
MathML uses lower case exclusively, does not separate the words, and furthermore often abbreviates the words, as in mi (math italic), mn (math number), mfrac (math fraction), msqrt (math square root), and reln (relation).
SOAP 1.2 uses camel case with initial capitals for element names ( env:Body , env:Envelope , env:Header , and so on) and camel case with an initial lowercase letter for attributes ( encodingStyle , env:role , env:mustUnderstand , and so on).
XHTML uses lower case exclusively with fairly short, abbreviated names, as in p , div , head , h1 , tr , td , img , and so on.

The one style you tend not to see is exclusively upper case like this:

 <STATEMENT xmlns="http://namespaces.megabank.com/">   <BANK>MegaBank</BANK>   <ACCOUNT>     <NUMBER>00003145298</NUMBER>     <TYPE>Savings</TYPE>     <OWNER>John Doe</OWNER>   </ACCOUNT >   <DATE>2003-30-02</DATE>   <OPENINGBALANCE>5266.34</OPENINGBALANCE>   <CLOSINGBALANCE>5266.34</CLOSINGBALANCE> </STATEMENT>

The reason is simple: In English and other case-aware languages (not, for example, Chinese and Hebrew) human eyes are trained to recognize words by their shape. After the first or second grade, readers do not sound out each letter when trying to identify a word. At least with common words like first, case, and language, we recognize the entire word as a unit. The ascenders and descenders of letters like d, g, l, k, and j are major contributors to the overall shape of a word. HOWEVER, IN UPPER CASE ALL LETTERS HAVE EXACTLY THE SAME HEIGHT, SO A SENTENCE WRITTEN IN PURE UPPER CASE IS MUCH HARDER TO READ. See what I mean? Consequently, a document will be much easier to read if you stick to lower case or a mix of upper and lower case.

XML vocabularies do tend to be verbose. Most vocabularies prefer to spell out the complete names of things rather than abbreviating. A few applications, such as DocBook and XHTML (in which a significant fraction of the user base authors by hand), use at least some abbreviations, but in general it's considered good form to spell out all words completely. This naturally raises the question of how to break the words. Spaces aren't legal in XML, but the next best thing is an underscore , as shown below.

 <Opening_Balance>5266.34</Opening_Balance> <Closing_Balance>5266.34</Closing_Balance>

Surprisingly, this tends not to be very common. The hyphen is a little more common but is eschewed because many data binding APIs can't easily map names containing hyphens to class names. The two most common conventions are camel case and pure lower case. However, pure lower case creates excessively long new words that are not easily recognized by their shape since they're unfamiliar. Camel case is much closer to what readers subconsciously expect. It is cleaner, easier to follow, and easier to debug. Thus I recommend using camel case.

Naturally, this recommendation does vary a little by language. If you're writing your markup in a language like Hebrew or Chinese that does not distinguish upper and lower case, you can pretty much ignore this entire item. If you're marking up in a language like German where the nouns are distinguished by capitalization, you might choose to capitalize only the nouns. However, in English and many other languages, camel case is the most appropriate choice.

I do not have a strong opinion about whether the first letter of a camel-cased element or attribute name should be lower or upper case. As a Java programmer, I'm accustomed to seeing class names begin with uppercase letters and field names begin with lowercase letters. Probably because of that, an initial uppercase letter for elements and an initial lowercase letter for attributes seems more correct to me, but I freely admit that I can't rationalize that feeling. C# and Delphi programmers often have the opposite preference. All I can really recommend is that you pick one convention and stick with it.