Chapter 10. Working with Strings | Python Programming with the Javaв„ў Class Libraries: A Tutorial for Building Web and Enterprise Applications with Jython

CONTENTS

Conversion: atoi(), atof(), atol()
Case Change: capitalize(), capwords(), swapcases(), lower(), upper()
Finding: find(), rfind(), index(), rindex(), count(), replace()
Splitting and Joining: split(), splitfields(), join(), joinfields()
Stripping and Parsing: lstrip(), rstrip(), strip()
Adjusting Text: ljust(), rjust(), center(), zfill(), expandtabs()
Summary

Terms in This Chapter

Argument
Decimal 10
Exception
Field
Function
Hexadecimal 16
Namespace

Octal 8
Parsing
Separator
Sequence
string module
Substring
Whitespace

Sooner or later, you'll need to format, parse, or manipulate strings. For these tasks, you'll likely use the functions in the Python string module. Spend some time familiarizing yourself with this module by following along with the examples in this chapter.

Conversion: atoi(), atof(), atol()

atoi(s[,base]) converts a string into an integer. The default is decimal, but you can specify octal 8, hexadecimal 16, or decimal 10. If 0 is the base, the string will be parsed as a hexadecimal if it has a leading 0x and as an octal if it has a leading 0. Otherwise, it will be treated as a decimal.

Let's do an example. In this and all other examples in this chapter, you have to first import the string module: from string import *.

Convert "1" to an integer.

>>> atoi("1") 1

Convert "255" to a base 10 integer.

>>> atoi("255",10) 255

Convert "FF" to a base 16 integer.

>>> atoi("FF",16) 255

The atof(s) function converts a string to a float.

>>> atof("1.1") 1.1

The atol(s[, base]) converts a string to a long.

>>> atol("1") 1L >>> atol("255", 16) 0xFFL

Case Change: capitalize(), capwords(), swapcases(), lower(), upper()

The capitalize(word) function capitalizes a given word in a string.

>>> capitalize("bill") 'Bill'

The capwords(s) function capitalizes all words in a string.

>>> str = "bill joy" >>> str = capwords(str) >>> print str Bill Joy

The swapcases(s) function converts uppercase letters to lowercase letters and vice versa.

>>> swapcase("ABC abc 123") 'abc ABC 123'

(Frankly, I don't see the value of this one.)

The lower(s) function converts uppercase letters to lowercase letters.

>>> lower("ABC abc 123") 'abc abc 123'

The upper(s) function converts lowercase letters to uppercase letters.

>>> upper("ABC abc 123") 'ABC ABC 123'

Finding: find(), rfind(), index(), rindex(), count(), replace()

The finding functions in the string module locate a substring within a string. For example, substrings of "Python is fun" are "Pyt", "is", "fun", "n is f", and so forth. Using a substring helps in parsing string data.

The find(s, sub, [start],[end]) function finds the first position of a substring in a given string. You can set the start and stop position arguments, which determine where in the string the search will begin and end. Here's an example:

>>> str = "apple peach pear grapes apple lime lemon" >>> position = find(str, "pear")

Here's a real-world use of find(): extracting text out of a tag when reading in an HTML file from a server.

Create some sample text embedded in HTML tags.

>>>    #simulated input string from some file >>> str = "<h1> text we want to extract </h1>"

Set the start and stop strings (the HTML tags).

>>> start = "<h1>"                #html tag >>> stop = "</h1>"                #html tag

Find the position of the first and second strings.

>>> begin = find(str,start)       #find the location of the 1st tag >>> end = find(str,stop)   #find the location of the 2nd tag

Locate the text to be extracted.

>>>   #compute where the start of the string we want is: >>> begin = begin + len(start)

Extract the text embedded in the HTML tags, and display it.

>>>    #using slice notation extract the text from the string >>> text = str[begin:end] >>> print text text we want to extract

The HTML tags supply the boundaries of the desired text.

rfind(s, sub, [start],[end]) is similar to find(), but it searches the substring from right to left. Here it finds the last occurrence of "apple" in the str string.

>>> str = "apple orange tangerine apple pear grapes" >>> rfind(str,"apple") 23 >>> find(str, "apple") #find finds the first occurrence 0

index(s, sub, [start],[end]) works like find() with one difference. When find() can't locate a substring, it returns a 1; when index() can't, it throws an exception.

>>> find(str, "commodore") -1 >>> index(str, "commodore") Traceback (innermost last):   File "<stdin>", line 1, in ?   File "D:\Apps\Python\Lib\string.py", line 226, in index ValueError: substring not found in string.index

rindex(s, sub, [start],[end]) searches from the back of the string for a substring. It's like rfind(), but throws an exception if it fails.

Find "green" in the str string.

>>> str = "blue blue blue green red red red" >>> rindex(str,"green") 15

Find "purple".

>>> rindex(str, "purple") Traceback (innermost last):   File "<stdin>", line 1, in ?   File "D:\Apps\Python\Lib\string.py", line 243, in rindex ValueError: substring not found in string.index

count(s, sub, [start],[end]) finds the number of occurrences of a substring in a string.

Count "blue" in the str string.

>>> str = "blue blue blue green red red red" >>> count(str, "blue") 3

Count "red".

>>> count(str, "red") 3

replace(str, old, new, [max]) replaces one substring with a new one. The max argument specifies the number of occurrences you want replaced. The default is all occurrences.

Create a string with four "apple" substrings.

>>> str = "apple, apple, apple, apple"

Replace the first "apple" with "pear".

>>> replace(str, "apple", "pear", 1) 'pear, apple, apple, apple'

Replace every occurrence of "apple" with "orange".

>>> replace(str, "apple", "orange") 'orange, orange, orange, orange'

Splitting and Joining: split(), splitfields(), join(), joinfields()

split(s, [sep], [maxsplit]) and splitfields(s, [sep], [maxsplit]) both split a string into a sequence. With the sep argument you can specify what you want to use for the separator the default is whitespace (spaces or tabs). The maxsplit optional argument allows you to specify how many items you want to break up; the default is all.

Here, with one line of code, we parse an address containing five fields. (Try to do this with Java, C, Delphi, or Visual Basic you can't.)

>>> input_string = "Bill,Gates,123 Main St., WA, 65743" >>> fname, lname, street, state, zip = split(input_string,",") >>> print """ ... Name: %(fname)s %(lname)s ... Street: %(street)s ... %(state)s, %(zip)s""" % locals() Name: Bill Gates Street: 123 Main St. WA, 65743

A Few Things to Note

You can assign multiple variables to a sequence (Chapter 12).
The locals() built-in function (Chapter 9) returns a dictionary that contains the variables in a local namespace, so the statement
```
>>> locals()["lname"]
```
returns
```
'Gates'
```
The % format string operator works with dictionaries or sequences (Chapter 3).

Here we demonstrate that split() and splitfields() do the same thing:

>>> split (input_string) ['Bill,Gates,123', 'Main', 'St.,', 'WA,', '65743'] >>> splitfields(input_string) ['Bill,Gates,123', 'Main', 'St.,', 'WA,', '65743']

Here's an example demonstrating the default operation for split():

>>> split("tab\tspace word1 word2        word3\t\t\tword4") ['tab', 'space', 'word1', 'word2', 'word3', 'word4']

join(words, [sep]) and joinfields(words, [sep]) also do the same thing. Here's our last example showing how to build an address string from a sequence of fields:

>>> seq = (fname, lname, street, state, zip) >>> input_string = join(seq, ",") >>> print input_string Bill,Gates,123 Main St., WA, 65743

The next two examples demonstrate the similarities of join() and joinfields():

>>> seq = ("1","2","3","4","5") >>> join(seq, "#") '1#2#3#4#5' >>> joinfields(seq,"#") '1#2#3#4#5'

Stripping and Parsing: lstrip(), rstrip(), strip()

When you parse strings, you often need to get rid of whitespace. This is what the stripping functions do. They're handy and convenient; I think you'll use them quite a bit.

Whitespace Variables

Whitespace is defined by the public variable whitespace in the string module. This code contains a tab and spaces:

>>> whitespace '\ 11 '

The lstrip(s) (left strip) function removes leading whitespace (on the left) in the string. The rstrip(s) (right strip) function removes the trailing whitespace (on the right). The strip(s) function removes both leading and trailing whitespace. Here's an example of all three:

>>> str = "      String    String " >>> lstrip(str) 'String    String        ' >>> rstrip(str) '       String     String' >>> strip(str) 'String    String'

Adjusting Text: ljust(), rjust(), center(), zfill(), expandtabs()

The functions for adjusting text are as handy and convenient as the parsing functions. You'll use them a lot, particularly for attractive report printing.

The ljust(s, width) function left-justifies a string to a given width. The rjust(s, width) function right-justifies it. The center(s, width) function centers a string to a given width. Here are examples of all three:

>>> rjust("String",20) '              String' >>> rjust ("str",20) '                 str' >>> ljust("String",20) 'String              ' >>> ljust("str",20) 'str                 ' >>> center("str",20) '        str         ' >>> center("String",20) '       String       '

zfill(snum,width) pads a numeric string with leading zeros.

>>> zfill("0.1", 10) '00000000.1'

expandtabs(s,tabsize) converts tabs into spaces that equal the width of the tabsize argument.

Create a string with tabs denoted by \t.

>>> str = "tab\ttab\ttab\t"

Expand the tabs to five spaces.

>>> expandtabs(str, 5) 'tab   tab   tab '

Expand the tabs to ten spaces.

>>> expandtabs(str,10) 'tab       tab       tab       '

Expand the tabs to twenty spaces.

>>> expandtabs(str, 20) 'tab                 tab                 tab                 '

Summary

In its standard distribution, Python provides a rich set of functions to manipulate and parse strings not found in other programming languages. My guess is that you'll use these functions a lot.

CONTENTS