Section 15.5. Exercises


15.5. Exercises

Regular Expressions. Create regular expressions in Exercises 15-1 to 15-12 to:

15-1.

Recognize the following strings: "bat," "bit," "but," "hat," "hit," or "hut."

15-2.

Match any pair of words separated by a single space, i.e., first and last names.

15-3.

Match any word and single letter separated by a comma and single space, as in last name, first initial.

15-4.

Match the set of all valid Python identifiers.

15-5.

Match a street address according to your local format (keep your RE general enough to match any number of street words, including the type designation). For example, American street addresses use the format: 1180 Bordeaux Drive. Make your RE general enough to support multi-word street names like: 3120 De la Cruz Boulevard.

15-6.

Match simple Web domain names that begin with "www." and end with a ".com" suffix, e.g., www.yahoo.com. Extra credit if your RE also supports other high-level domain names: .edu, .net, etc., e.g., www.ucsc.edu.

15-7.

Match the set of the string representations of all Python integers.

15-8.

Match the set of the string representations of all Python longs.

15-9.

Match the set of the string representations of all Python floats.

15-10.

Match the set of the string representations of all Python complex numbers.

15-11.

Match the set of all valid e-mail addresses (start with a loose RE, then try to tighten it as much as you can, yet maintain correct functionality).

15-12.

Match the set of all valid Web site addresses (URLs) (start with a loose RE, then try to tighten it as much as you can, yet maintain correct functionality).

15-13.

type(). The type() built-in function returns a type object, which is displayed as a Pythonic-looking string:

>>> type(0) <type 'int'> >>> type(.34) <type 'float'> >>> type(dir) <type 'builtin_function_or_method'>


Create an RE that would extract out the actual type name from the string. Your function should take a string like this "<type 'int'>" and return "int". (Ditto for all other types, i.e., 'float', 'builtin_function_or_method', etc.) Note: You are implementing the value that is stored in the __name__ attribute for classes and some built-in types.

15-14.

Regular Expressions. In Section 15.2, we gave you the RE pattern that matched the single- or double-digit string representations of the months January to September ("0?[1-9]"). Create the RE that represents the remaining three months in the standard calendar.

15-15.

Regular Expressions. Also in Section 15.2, we gave you the RE pattern that matched credit card (CC) numbers ("[0-9]{15,16}"). However, this pattern does not allow for hyphens separating blocks of numbers. Create the RE that allows hyphens, but only in the correct locations. For example, 15-digit CC numbers have a pattern of 4-6-5, indicating four digits-hyphen-six digits-hyphen-five digits, and 16-digit CC numbers have a 4-4-4-4 pattern. Remember to "balloon" the size of the entire string correctly. Extra credit: There is a standard algorithm for determining whether a CC number is valid. Write some code not only to recognize a correctly formatted CC number, but also a valid one.

The next set of problems (15-16 through 15-27) deal specifically with the data that are generated by gendata.py. Before approaching problems 15-17 and 15-18, you may wish to do 15-16 and all the regular expressions first.

15-16.

Update the code for gendata.py so that the data are written directly to redata.txt rather than output to the screen.

15-17.

Determine how many times each day of the week shows up for any incarnation of redata.txt. (Alternatively, you can also count how many times each month of the year was chosen.)

15-18.

Ensure there is no data corruption in redata.txt by confirming that the first integer of the integer field matches the timestamp given at the front of each output line.

Create regular expressions to:

15-19.

Extract the complete timestamps from each line.

15-20.

Extract the complete e-mail address from each line.

15-21.

Extract only the months from the timestamps.

15-22.

Extract only the years from the timestamps.

15-23.

Extract only the time (HH:MM:SS) from the timestamps.

15-24.

Extract only the login and domain names (both the main domain name and the high-level domain together) from the e-mail address.

15-25.

Extract only the login and domain names (both the main domain name and the high-level domain) from the e-mail address.

15-26.

Replace the e-mail address from each line of data with your e-mail address.

15-27.

Extract the months, days, and years from the timestamps and output them in "Mon Day, Year" format, iterating over each line only once.

For problems 15-28 and 15-29, recall the regular expression introduced in Section 15.2, which matched telephone numbers but allowed for an optional area code prefix: \d{3}-\d{3}-\d{4}. Update this regular expression so that:

15-28.

Area codes (the first set of three-digits and the accompanying hyphen) are optional, i.e., your RE should match both 800-555-1212 as well as just 555-1212.

15-29.

Either parenthesized or hyphenated area codes are supported, not to mention optional; make your RE match 800-555-1212, 555-1212, and also (800) 555-1212.



Core Python Programming
Core Python Programming (2nd Edition)
ISBN: 0132269937
EAN: 2147483647
Year: 2004
Pages: 334
Authors: Wesley J Chun

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net