Flylib.com

Books Software

 
 
 

Section 14.10. Exercises


14.10. Exercises

14-1.

Callable Objects . Name Python's callable objects. exec versus eval() . What is the difference between the exec statement and the eval() BIF ?

14-2.

input() versus raw.input() . What is the difference between the BIFs input() and raw_input() ?

14-3.

Execution Environment. Create a Python script that runs other Python scripts.

14-4.

os.system() . Choose a familiar system command that performs a task without requiring input and either outputs to the screen or does not output at all. Use the os.system() call to run that program. Extra credit: Port your solution to subprocess.call() .

14-5.

commands.getoutput() . Solve the previous problem using commands . getoutput() .

14-6.

popen() Family . Choose another familiar system command that takes text from standard input and manipulates or otherwise outputs the data. Use os.popen() to communicate with this program. Where does the output go? Try using popen2.popen2() instead.

14-7.

subprocess Module . Take your solutions from the previous problem and port them to the subprocess module.

14-8.

Exit Function. Design a function to be called when your program exits. Install it as sys.exitfunc() , run your program, and show that your exit function was indeed called.

14-9.

Shells . Create a shell (operating system interface) program. Present a command-line interface that accepts operating system commands for execution (any platform).

Extra credit 1: Support pipes (see the dup() , dup2() , and pipe() functions in the os module). This piping procedure allows the standard output of one process to be connected to the standard input of another.

Extra credit 2: Support inverse pipes using parentheses, giving your shell a functional programming-like interface. In other words, instead of piping commands like ...

ps -ef  grep root  sort -n +1


... support a more functional style like...

sort(grep(ps -ef, root), -n, +1)


14-10.

fork()/exec*() versus spawn*() . What is the difference between using the fork()-exec*() pairs vs. the spawn*() family of functions? Do you get more with one over the other?

14-11.

Generating and Executing Python Code . Take the funcAttrs.py script (Example 14.4) and use it to add testing code to functions that you have in some of your existing programs. Build a testing framework that runs your test code every time it encounters your special function attributes.



Part II: Advanced Topics

Chapter 15.  Regular Expressions

Chapter 16.  Network Programming

Chapter 17.  Internet Client Programming

Chapter 18.  Multithreaded Programming

Chapter 19.  GUI Programming

Chapter 20.  Web Programming

Chapter 21.  Database Programming

Chapter 22.  Extending Python

Chapter 23.  Miscellaneous

Appendix A.  Answers to Selected Exercises

Appendix B.  Reference Tables

Appendix 3.  About the Author



Chapter 15. Regular Expressions

Chapter Topics

  • Introduction/Motivation

  • Special Characters and Symbols

  • Regular Expressions and Python

  • re Module



15.1. Introduction/Motivation

Manipulating text/data is a big thing. If you don't believe me, look very carefully at what computers primarily do today. Word processing, "fill-out-form" Web pages, streams of information coming from a database dump, stock quote information, news feedsthe list goes on and on. Because we may not know the exact text or data that we have programmed our machines to process, it becomes advantageous to be able to express this text or data in patterns that a machine can recognize and take action upon.

If I were running an electronic mail (e-mail) archiving company, and you were one of my customers who requested all his or her e-mail sent and received last February, for example, it would be nice if I could set a computer program to collate and forward that information to you, rather than having a human being read through your e-mail and process your request manually. You would be horrified (and infuriated) that someone would be rummaging through your messages, even if his or her eyes were supposed to be looking only at time-stamp. Another example request might be to look for a subject line like "ILOVEYOU" indicating a virus-infected message and remove those e-mail messages from your personal archive. So this begs the question of how we can program machines with the ability to look for patterns in text.

Regular expressions (REs) provide such an infrastructure for advanced text pattern matching, extraction, and/or search-and-replace functionality. REs are simply strings that use special symbols and characters to indicate pattern repetition or to represent multiple characters so that they can "match" a set of strings with similar characteristics described by the pattern (Figure 15-1). In other words, they enable matching of multiple stringsan RE pattern that matched only one string would be rather boring and ineffective , wouldn't you say?

Figure 15-1. You can use regular expressions, such as the one here, which recognizes valid Python identifiers. " [A-Za-z]\w+ " means the first character should be alphabetic, i.e., either A-Z or a-z, followed by at least one (+) alphanumeric character (\w). In our filter, notice how many strings go into the filter, but the only ones to come out are the ones we asked for via the RE. One example that did not make it was "4xZ" because it starts with a number.


Python supports REs through the standard library re module. In this introductory subsection, we will give you a brief and concise introduction. Due to its brevity, only the most common aspects of REs used in everyday Python programming will be covered. Your experience will, of course, vary. We highly recommend reading any of the official supporting documentation as well as external texts on this interesting subject. You will never look at strings the same way again!

Core Note: Searching versus matching

Throughout this chapter, you will find references to searching and matching. When we are strictly discussing regular expressions with respect to patterns in strings, we will say "matching," referring to the term pattern-matching. In Python terminology, there are two main ways to accomplish pattern-matching : searching, i.e., looking for a pattern match in any part of a string, and matching, i.e., attempting to match a pattern to an entire string (starting from the beginning). Searches are accomplished using the search() function or method, and matching is done with the match() function or method. In summary, we keep the term "matching" universal when referencing patterns, and we differentiate between "searching" and "matching" in terms of how Python accomplishes pattern-matching.


15.1.1. Your First Regular Expression

As we mentioned above, REs are strings containing text and special characters that describe a pattern with which to recognize multiple strings. We also briefly discussed a regular expression alphabet and for general text, the alphabet used for regular expressions is the set of all uppercase and lowercase letters plus numeric digits. Specialized alphabets are also possible, for instance, one consisting of only the characters "0" and "1". The set of all strings over this alphabet describes all binary strings, i.e., "0," "1," "00," "01," "10," "11," "100," etc.

Let us look at the most basic of regular expressions now to show you that although REs are sometimes considered an "advanced topic," they can also be rather simplistic. Using the standard alphabet for general text, we present some simple REs and the strings that their patterns describe. The following regular expressions are the most basic, "true vanilla ," as it were. They simply consist of a string pattern that matches only one string, the string defined by the regular expression. We now present the REs followed by the strings that match them:

RE Pattern

String(s) Matched

foo

foo

Python

Python

abc123

abc123


The first regular expression pattern from the above chart is "foo." This pattern has no special symbols to match any other symbol other than those described, so the only string that matches this pattern is the string "foo." The same thing applies to "Python" and "abc123." The power of regular expressions comes in when special characters are used to define character sets, subgroup matching, and pattern repetition. It is these special symbols that allow an RE to match a set of strings rather than a single one.