5.4 String Methods

In addition to expression operators, strings provide a set of methods that implement more sophisticated text processing tasks. Methods are simply functions that are associated with a particular object. Technically, they are attributes attached to objects, which happen to reference a callable function. In Python, methods are specific to object types; string methods, for example, only work on string objects.

Functions are packages of code, and method calls combine two operations at once an attribute fetch, and a call:

Attribute fetches: An expression of the form object.attribute means "fetch the value of attribute in object."
Call expressions: An expression of the form function(arguments) means "invoke the code of function, passing zero or more comma-separated argument objects to it, and returning the function's result value."

Putting these two together allows us to call a method of an object. The method call expression object.method(arguments) is evaluated from left to right Python will first fetch the method of the object, and then call it, passing in the arguments. If the method computes a result, it will come back as the result of the entire method call expression.

As you'll see throughout Part II, most objects have callable methods, and all are accessed using this same method call syntax. To call an object method, you have to go through an existing object; let's move on to some examples to see how.

5.4.1 String Method Examples: Changing Strings

Table 5-4 summarizes the call patterns for built-in string methods. They implement higher-level operations, like splitting and joining, case conversions and tests, and substring searches. Let's work through some code that demonstrates some of the most commonly used methods in action, and presents Python text-processing basics along the way.

Table 5-4. String method calls
S.capitalize( )	S.ljust(width)
S.center(width)	S.lower( )
S.count(sub [, start [, end]])	S.lstrip( )
S.encode([encoding [,errors]])	S.replace(old, new [, maxsplit])
S.endswith(suffix [, start [, end]])	S.rfind(sub [,start [,end]])
S.expandtabs([tabsize])	S.rindex(sub [, start [, end]])
S.find(sub [, start [, end]])	S.rjust(width)
S.index(sub [, start [, end]])	S.rstrip( )
S.isalnum( )	S.split([sep [,maxsplit]])
S.isalpha( )	S.splitlines([keepends])
S.isdigit( )	S.startswith(prefix [, start [, end]])
S.islower( )	S.strip( )
S.isspace( )	S.swapcase( )
S.istitle( )	S.title( )
S.isupper( )	S.translate(table [, delchars])
S.join(seq)	S.upper( )

Because strings are immutable, they cannot be changed in-place directly. To make a new text value, you can always construct a new string with operations such as slicing and concatenating. For example, to replace two characters in the middle of a string:

>>> S = 'spammy' >>> S = S[:3] + 'xx' + S[5:] >>> S 'spaxxy'

But if you're really just out to replace a substring, you can use the string replace method instead:

>>> S = 'spammy' >>> S = S.replace('mm', 'xx') >>> S 'spaxxy'

The replace method is more general than this code implies. It takes as arguments the original substring (of any length), and the string (of any length) to replace it with, and performs a global search-and replace:

>>> 'aa$bb$cc$dd'.replace('$', 'SPAM') 'aaSPAMbbSPAMccSPAMdd'

In such roles, replace can be used to implement template replacement sorts of tools (e.g., form letters). Notice how the result simply prints this time, instead of assigning it to a name; you need to assign results to names only if you want to retain them for later use. If you need to replace one fixed-size string that can occur at any offset, you either can do replacement again, or search for the substring with the string find method and slice:

>>> S = 'xxxxSPAMxxxxSPAMxxxx' >>> where = S.find('SPAM')          # Search for position >>> where                           # Occurs at offset 4 4 >>> S = S[:where] + 'EGGS' + S[(where+4):] >>> S 'xxxxEGGSxxxxSPAMxxxx'

The find method returns the offset where the substring appears (by default, searching from the front), or -1 if it is not found. Another way is to use replace with a third argument to limit it to a single substitution:

>>> S = 'xxxxSPAMxxxxSPAMxxxx' >>> S.replace('SPAM', 'EGGS')           # Replace all 'xxxxEGGSxxxxEGGSxxxx' >>> S.replace('SPAM', 'EGGS', 1)        # Replace one 'xxxxEGGSxxxxSPAMxxxx'

Notice that replace is returning a new string each time here. Because strings are immutable, methods never really change the subject string in-place, even if they are called "replace."

In fact, one potential downside of using either concatenation or the replace method to change strings, is that they both generate new string objects, each time they are run. If you have to apply many changes to a very large string, you might be able to improve your script's performance by converting the string to an object that does support in-place changes:

>>> S = 'spammy' >>> L = list(S) >>> L ['s', 'p', 'a', 'm', 'm', 'y']

The built-in list function (or an object construction call), builds a new list out of the items in any sequence in this case, "exploding" the characters of a string into a list. Once in this form, you can make multiple changes, without generating copies of the string for each change:

>>> L[3] = 'x'              # Works for lists, not strings >>> L[4] = 'x' >>> L ['s', 'p', 'a', 'x', 'x', 'y']

If, after your changes, you need to convert back to a string (e.g., to write to a file), use the string join method to "implode" the list back into a string:

>>> S = ''.join(L) >>> S 'spaxxy'

The join method may look a bit backward at first sight. Because it is a method of strings (not the list of strings), it is called through the desired delimiter. join puts the list's strings together, with the delimiter between list items; in this case, using an empty string delimiter to convert from list back to string. More generally, any string delimiter and strings list will do:

>>> 'SPAM'.join(['eggs', 'sausage', 'ham', 'toast']) 'eggsSPAMsausageSPAMhamSPAMtoast'

5.4.2 String Method Examples: Parsing Text

Another common role for string methods is as a simple form of text parsing analyzing structure and extracting substrings. To extract substrings at fixed offsets, we can employ slicing techniques:

>>> line = 'aaa bbb ccc' >>> col1 = line[0:3] >>> col3 = line[8:] >>> col1 'aaa' >>> col3 'ccc'

Here, the columns of data appear at fixed offsets, and so may be sliced out of the original string. This technique passes for parsing, as long as your data has fixed positions for its components. If the data is separated by some sort of delimiter instead, we can pull out its components by splitting, even if the data may show up at arbitrary positions within the string:

>>> line = 'aaa bbb   ccc' >>> cols = line.split(  ) >>> cols ['aaa', 'bbb', 'ccc']

The string split method chops up a string into a list of substrings, around a delimiter string. We didn't pass a delimiter in the prior example, so it defaults to whitespace the string is split at groups of one or more spaces, tabs, and newlines, and we get back a list of the resulting substrings. In other applications, the data may be separated by more tangible delimiters, such as keywords or commas:

>>> line = 'bob,hacker,40' >>> line.split(',') ['bob', 'hacker', '40']

This example splits (and hence parses) the string at commas, a separator common in data returned by some database tools. Delimiters can be longer than a single character too:

>>> line = "i'mSPAMaSPAMlumberjack" >>> line.split("SPAM") ["i'm", 'a', 'lumberjack']

Although there are limits to the parsing potential of slicing and splitting, both run very fast, and can handle basic text extraction cores.

You'll meet additional string examples later in this book. For more details, also see the Python library manual and other documentation sources, or simply experiment with these interactively on your own. Note that none of the string methods accept patterns for pattern-based text processing, you must use the Python re standard library module. Because of this limitation, though, string methods sometimes run more quickly than the re module's tools.

5.4.3 The Original Module

Python's string method story is somewhat convoluted by history. For roughly the first decade of Python's existence, it provided a standard library module called string, which contained functions that largely mirror the current set of string object methods. Later, in Python 2.0 (and the short-lived 1.6), these functions were made available as methods of string objects, in response to user requests. Because so many people wrote so much code that relied on the original string module, it is retained for backward compatibility.

The upshot of this legacy is that today, there are usually two ways to invoke advanced string operations by calling object methods, or calling string module functions and passing in the object as an argument. For instance, given a variable X assigned to a string object, calling an object method:

X.method(arguments)

is usually equivalent to calling the same operation through the module:

string.method(X, arguments)

provided that you have already imported the module string. Here's an example of both call patterns in action first, the method scheme:

>>> S = 'a+b+c+' >>> x = S.replace('+', 'spam') >>> x 'aspambspamcspam'

To access the same operation through the module, you need to import the module (at least once in your process), and pass in the object:

>>> import string >>> y = string.replace(S, '+', 'spam') >>> y 'aspambspamcspam'

Because the module approach was the standard for so long, and because strings are such a central component of most programs, you will probably see both call patterns in Python code you come across.

Today, though, the general recommendation is to use methods instead of the module. The module call scheme requires you to import the string module (methods do not). The string module makes calls a few characters longer to type (at least when you load the module with import, but not for from). In addition, the module may run more slowly than methods (the current module maps most calls back to the methods, and so incurs an extra call along the way).

On the other hand, because the overlap between module and method tools is not exact, you may still sometimes need to use either scheme some methods are only available as methods, and some as module functions. In addition, some programmers prefer to use the module call pattern, because the module's name makes it more obvious that code is calling string tools: string.method(x) seems more self-documenting than x.method( ) to some. As always, the choice should ultimately be yours to make.

5.4.1 String Method Examples: Changing Strings

Table 5-4. String method calls

5.4.2 String Method Examples: Parsing Text

5.4.3 The Original Module