Section 6.4. String-Only Operators

6.4. String-Only Operators

6.4.1. Format Operator ( `%` )

Python features a string format operator. This operator is unique to strings and makes up for the lack of having functions from C's printf() family. In fact, it even uses the same symbol, the percent sign (%), and supports all the printf() formatting codes.

The syntax for using the format operator is as follows:

format_string % (arguments_to_convert)

The format_string on the left-hand side is what you would typically find as the first argument to printf(): the format string with any of the embedded % codes. The set of valid codes is given in Table 6.4. The arguments_to_convert parameter matches the remaining arguments you would send to printf(), namely the set of variables to convert and display.

Table 6.4. Format Operator Conversion Symbols
Format Symbol	Conversion
`%c`	Character (integer [ASCII value] or string of length 1)
`%r`^[a]	String conversion via `repr()` prior to formatting
`%s`	String conversion via `str()` prior to formatting
`%d / %i`	Signed decimal integer
`%u`^[b]	Unsigned decimal integer
`%o`^[b]	(Unsigned) octal integer
`%x`^[b]`/ %X`	(Unsigned) hexadecimal integer (lower/UPPERcase letters)
`%e / %E`	Exponential notation (with lowercase '`e`'/UPPERcase 'E')
`%f / %F`	Floating point real number (fraction truncates naturally)
`%g / %G`	The shorter of `%e` and `%f/%E%` and `%F%`
`%%`	Percent character ( `%` ) unescaped

^[a] New in Python 2.0; likely unique only to Python.

^[b] %u/%o/%x/%X of negative int will return a signed string in Python 2.4.

Python supports two formats for the input arguments. The first is a tuple (introduced in Section 2.8, formally in 6.15), which is basically the set of arguments to convert, just like for C's printf(). The second format that Python supports is a dictionary (Chapter 7). A dictionary is basically a set of hashed key-value pairs. The keys are requested in the format_string, and the corresponding values are provided when the string is formatted.

Converted strings can either be used in conjunction with the print statement to display out to the user or saved into a new string for future processing or displaying to a graphical user interface.

Other supported symbols and functionality are listed in Table 6.5.

Table 6.5. Format Operator Auxiliary Directives
Symbol	Functionality
`*`	Argument specifies width or precision
-	Use left justification
+	Use a plus sign ( + ) for positive numbers
`<sp>`	Use space-padding for positive numbers
`#`	Add the octal leading zero ('`0`') or hexadecimal leading '`0x`' or '`0X`', depending on whether '`x`' or '`X`' were used.
`0`	Use zero-padding (instead of spaces) when formatting numbers
`%`	'`%%`' leaves you with a single literal '`%`'
`(var)`	Mapping variable (dictionary arguments)
`m.n`	`m` is the minimum total width and `n` is the number of digits to display after the decimal point (if applicable)

As with C's printf(), the asterisk symbol ( * ) may be used to dynamically indicate the width and precision via a value in argument tuple. Before we get to our examples, one more word of caution: long integers are more than likely too large for conversion to standard integers, so we recommend using exponential notation to get them to fit.

Here are some examples using the string format operator:

Hexadecimal Output

>>> "%x" % 108 '6c' >>> >>> "%X" % 108 '6C' >>> >>> "%#X" % 108 '0X6C' >>> >>> "%#x" % 108 '0x6c'

Floating Point and Exponential Notation Output

>>> >>> '%f' % 1234.567890 '1234.567890' >>> >>> '%.2f' % 1234.567890 '1234.57' >>> >>> '%E' % 1234.567890 '1.234568E+03' >>> >>> '%e' % 1234.567890 '1.234568e+03' >>> >>> '%g' % 1234.567890 '1234.57' >>> >>> '%G' % 1234.567890 '1234.57' >>> >>> "%e" % (1111111111111111111111L) '1.111111e+21'

Integer and String Output

>>> "%+d" % 4 '+4' >>> >>> "%+d" % -4 '-4' >>> >>> "we are at %d%%" % 100 'we are at 100%' >>> >>> 'Your host is: %s' % 'earth' 'Your host is: earth' >>> >>> 'Host: %s\tPort: %d' % ('mars', 80) 'Host: mars    Port: 80' >>> >>> num = 123 >>> 'dec: %d/oct: %#o/hex: %#X' % (num, num, num) 'dec: 123/oct: 0173/hex: 0X7B' >>> >>> "MM/DD/YY = %02d/%02d/%d" % (2, 15, 67) 'MM/DD/YY = 02/15/67' >>> >>> w, p = 'Web', 'page' >>> 'http://xxx.yyy.zzz/%s/%s.html' % (w, p) 'http://xxx.yyy.zzz/Web/page.html'

The previous examples all use tuple arguments for conversion. Below, we show how to use a dictionary argument for the format operator:

>>> 'There are %(howmany)d %(lang)s Quotation Symbols' % \ ...     {'lang': 'Python', 'howmany': 3} 'There are 3 Python Quotation Symbols'

Amazing Debugging Tool

The string format operator is not only a cool, easy-to-use, and familiar feature, but a great and useful debugging tool as well. Practically all Python objects have a string presentation (either evaluatable from repr() or '',or printable from str()). The print statement automatically invokes the str() function for an object. This gets even better. When you are defining your own objects, there are hooks for you to create string representations of your object such that repr() and str() (and '' and print) return an appropriate string as output. And if worse comes to worst and neither repr() or str() is able to display an object, the Pythonic default is at least to give you something of the format:

<... something that is useful ...>

6.4.2. String Templates: Simpler Substitution

The string format operator has been a mainstay of Python and will continue to be so. One of its drawbacks, however, is that it is not as intuitive to the new Python programmer not coming from a C/C++ background. Even for current developers using the dictionary form can accidentally leave off the type format symbol, i.e., %(lang) vs. the more correct %(lang)s. In addition to remembering to put in the correct formatting directive, the programmer must also know the type, i.e., is it a string, an integer, etc.

The justification of the new string templates is to do away with having to remember such details and use string substitution much like those in current shell-type scripting languages, the dollar sign ( $ ).

The string module is temporarily resurrected from the dead as the new Template class has been added to it. Template objects have two methods, substitute() and safe_substitute(). The former is more strict, throwing KeyError exceptions for missing keys while the latter will keep the substitution string intact when there is a missing key:

>>> from string import Template >>> s = Template('There are ${howmany} ${lang} Quotation Symbols') >>> >>> print s.substitute(lang='Python', howmany=3) There are 3 Python Quotation Symbols >>> >>> print s.substitute(lang='Python') Traceback (most recent call last):   File "<stdin>", line 1, in ?   File "/usr/local/lib/python2.4/string.py", line 172, in substitute     return self.pattern.sub(convert, self.template)   File "/usr/local/lib/python2.4/string.py", line 162, in convert     val = mapping[named] KeyError: 'howmany' >>> >>> print s.safe_substitute(lang='Python') There are ${howmany} Python Quotation Symbols

The new string templates were added to Python in version 2.4. More information about them can be found in the Python Library Reference Manual and PEP 292.

6.4.3. Raw String Operator ( r / R )

The purpose of raw strings, introduced back in version 1.5, is to counteract the behavior of the special escape characters that occur in strings (see the subsection below on what some of these characters are). In raw strings, all characters are taken verbatim with no translation to special or non-printed characters.

This feature makes raw strings absolutely convenient when such behavior is desired, such as when composing regular expressions (see the re module documentation). Regular expressions (REs) are strings that define advanced search patterns for strings and usually consist of special symbols to indicate characters, grouping and matching information, variable names, and character classes. The syntax for REs contains enough symbols already, but when you have to insert additional symbols to make special characters act like normal characters, you end up with a virtual "alphanumersymbolic" soup! Raw strings lend a helping hand by not requiring all the normal symbols needed when composing RE patterns.

The syntax for raw strings is exactly the same as for normal strings with the exception of the raw string operator, the letter "r," which precedes the quotation marks. The "r" can be lowercase (r) or uppercase (R) and must be placed immediately preceding the first quote mark.

In the first of our three examples, we really want a backslash followed by an 'n' as opposed to a NEWLINE character:

>>> '\n' '\n' >>> print '\n' >>> r'\n' '\\n' >>> print r'\n' \n

Next, we cannot seem to open our README file. Why not? Because the \t and \r are taken as special symbols which really are not part of our filename, but are four individual characters that are part of our file pathname.

>>> f = open('C:\windows\temp\readme.txt', 'r') Traceback (most recent call last):   File "<stdin>", line 1, in ?     f = open('C:\windows\temp\readme.txt', 'r') IOError: [Errno 2] No such file or directory: 'C:\\win- dows\\temp\readme.txt' >>> f = open(r'C:\windows\temp\readme.txt', 'r') >>> f.readline() 'Table of Contents (please check timestamps for last  update!)\n' >>> f.close()

Finally, we are (ironically) looking for a raw pair of characters \n and not NEWLINE. In order to find it, we are attempting to use a simple regular expression that looks for backslash-character pairs that are normally single special whitespace characters:

  >>> import re   >>> m = re.search('\\[rtfvn]', r'Hello World!\n')   >>> if m is not None: m.group()   ...   >>> m = re.search(r'\\[rtfvn]', r'Hello World!\n')   >>> if m is not None: m.group()   ...   '\\n'

6.4.4. Unicode String Operator ( u / U )

The Unicode string operator, uppercase (U) and lowercase (u), introduced with Unicode string support in Python 1.6, takes standard strings or strings with Unicode characters in them and converts them to a full Unicode string object. More details on Unicode strings are available in Section 6.7.4. In addition, Unicode support is available via string methods (Section 6.6) and the regular expression engine. Here are some examples:

  u'abc'         U+0061 U+0062 U+0063   u'\u1234'      U+1234   u'abc\u1234\n' U+0061 U+0062 U+0063 U+1234 U+0012

The Unicode operator can also accept raw Unicode strings if used in conjunction with the raw string operator discussed in the previous section. The Unicode operator must precede the raw string operator.

ur'Hello\nWorld!'