< Day Day Up > |
PerlPerl, Practical Extraction and Reporting Language, is a mouthful to say, but shouldn't be judged by its somewhat bland name. Originally designed to make working with text data simple, Perl has been expanded by developers to handle tasks such as image manipulation and client/server activities. Because of its ease of use and capability to work with ambiguous user input, Perl is a popular web development language. For example, assume that you want to extract a phone number from an input string. A user might enter 555-5654, 5552231, 421-5552313, and so on. It is up to the application to find the area code, local exchange, and identifier numbers. In Perl, doing so is simple: #!/usr/bin/perl print "Please enter a phone number:"; $phone=<STDIN>; $phone=~s/[^\d]//g; $phone=~s/^1//; if (length($phone)==7) { $phone=~/(\d{3,3})(\d{4,4})/; $area="???"; $prefix=$1; $number=$2; } elsif (length($phone)==10) { $phone=~/(\d{3,3})(\d{3,3})(\d{4,4})/; $area=$1; $prefix=$2; $number=$3; } else { print "Invalid number!"; exit; } print "($area) $prefix-$number\n"; This program accepts a phone number as input, strips any unusual characters from it, removes a leading 1, if included, and then formats the result in an attractive manner. Applying this capability to mine data from user input to web development creates opportunities for programmers to write extremely user-friendly software. Perl programs are similar to shell scripts in that they are interpreted by an additional piece of software. Each script starts with a line that includes the path to the Perl interpreter. In Mac OS X, this is typically #!/usr/bin/perl. On entering a script, it must be made executable by typing chmod +x <script name>. Finally, it can be run by entering its complete path at the command line or by typing ./<script name> from the same directory as the script. Alternatively, you can invoke a script by passing it as an argument to Perl that is, perl <script name>. For more information on this process, refer to Chapter 15, "Shell Configuration and Programming (Shell Scripting)." Although this chapter provides enough information to write a program like the one shown here, it is not a complete reference to Perl. Perl is an object-oriented language with thousands of functions. Sams Teach Yourself Perl in 21 Days is an excellent read and a great way to beef up on the topic.
Variables and Data TypesPerl has a number of different variable types, but the most common are shown in Table 18.1. Perl variable names are composed of alphanumeric characters and are case sensitive, unlike much of Mac OS X. This means that a variable named $mymacosx is entirely different from $myMacOSX. Unlike some languages, such as C, Perl performs automatic type conversion when possible. A programmer can use a variable as a number in one statement and a string in the next.
Input/Output FunctionsBecause Perl is so useful for manipulating data, one of the first things you'll want to do is get data into a script. There are a number of ways to do this, including reading from a file or the Terminal window. To Perl, however, command-line input and file input are much the same thing. To use either, you must read from an input stream. Input StreamsTo input data into a variable from a file, use $variable=<FILEHANDLE>. This inputs data up to a newline character into the named variable. To read from the command line, the filehandle is replaced with a special handle that points to the standard input stream: <STDIN>. When data is read from an input stream, it contains the end-of-line character (newline) as part of the data. This is usually an unwanted piece of information that can be stripped off using the chomp command. Failure to use chomp often results in debugging headaches as you attempt to figure out why your string comparison routines are failing. For example, the following reads a line from standard (command line) input and removes the trailing newline character: $myname=<STDIN>; chomp($myname); To read data in from an actual stored file, it must first be opened with open <FILEHANDLE>, "<filename>". For example, the following reads the first line of a file named MacOSX.txt: open FILEHANDLE, "MacOSX.txt"; $line1=<FILENAME>; close FILEHANDLE; When you've finished reading a file, use close followed by the filehandle to be closed. Outputting DataOutputting data is the job of the print command. print can display text strings or the contents of variables. In addition, you can embed special characters in a print statement that are otherwise unprintable. For example: print "I love Mac OS X!\n----------------\n"; In this sample line, the \n is a newline character this moves the cursor down a line so that subsequent output occurs on a new line, rather than the same line as the current print statement. Table 18.2 contains other common special characters.
Many characters (such as ") have a special meaning in Perl; if you want to refer to them literally, you must prefix them with \ this is called escaping the character. In most cases, nonalphanumeric characters should be escaped just to be on the safe side. File OutputTo output data to a file rather than standard output, you must first open a file to receive the information. This is nearly identical to the open operation used to read data, except for one difference. When writing to a file, you must prefix the name of the file with one of two different character strings:
With a file open, the print command is again used for output. This time, however, it includes the filehandle of the output file. For example, this code saves "Mac OS X" to a file named MyOS.txt: open MYFILE, "> MyOS.txt"; print MYFILE "Mac OS X\n"; close MYFILE; Again, the close command is used to close the file when all output has completed. External Results (``)One of the more novel (and powerful) ways to get information into Perl is through an external program. For example, to quickly and easily grab a listing of running processes, you could use the output of the Unix ps axg command: $processlist=`ps axg`; Backtick (``) characters should be placed around the command of the output you want to capture. Perl pauses and waits for the external command to finish executing before it continues processing. This is both a dangerous and powerful tool. You can easily read an entire file into a variable by using the cat command with backticks. Unfortunately, if the external program fails to execute correctly, the Perl script might hang indefinitely. ExpressionsAlthough Perl variables can hold numbers or strings, you still need to perform the appropriate type of comparison based on the values being compared. For example, numbers can be compared for equality using ==, but strings must be compared with eq. If you attempt to use == to compare two strings, the expression will evaluate to true because the numeric value of both strings is zero, regardless of the text they contain. Table 18.3 displays common Perl expressions.
Regular ExpressionsRegular expressions (regex) are a bit more interesting than the expressions in the preceding section. Like one of the previous expressions, a regex evaluates to a true or false state. In addition, they are used to locate and extract data from strings. For example, assume that the variable $mycomputer contains the information My computer is a Mac. To create a regular expression that would test the string for the presence of the word mac, you could write $mycomputer=~/mac/i Although this line might look like an assignment statement, it is in fact looking inside the variable $mycomputer for the pattern mac. The pattern that a regular expression matches is delimited by the / characters (unless changed by the programmer). The i after the expression tells Perl that it should perform a case-insensitive search, allowing it to match strings such as MAC and mAC. To understand the power of regular expressions, you must first understand the pattern-matching language that comprises them. PatternsRegular expressions are made up of groups of pattern-matching symbols. These special characters symbolically represent the contents of a string and can be used to build complex pattern-matching rules with relative ease. Table 18.4 contains the most common components of regular expressions and their purpose.
The bracket characters enable you to clearly define the characters that you want to match if a predefined sequence doesn't already exist. For example, if you want to match only the uppercase letters A through Z and the numbers 1, 2, and 3, you could write [A-Z123] As shown in this example, you can represent a contiguous sequence of letters or numbers as a range by specifying the start and end characters of the range, separated by a -. Pattern RepetitionWith the capability to write patterns, you can match arbitrary strings within a character sequence. What's missing is the capability to match strings of varying lengths. These repetition characters modify the pattern they follow and enable it to be matched once, twice, or as many times as you want:
When a repetition sequence is followed by a ?, the pattern will match as few characters as possible to be considered true. For example, the following expression matches between 5 and 10 occurrences of the numbers 1, 2, or 3: $testnumbers=~/[1-3]{5,10}/; The capability to match an arbitrary number of characters enables programmers to deal with information they might not be expecting. Extracting Information from a Regular ExpressionAlthough it's useful to be able to find strings that contain a certain pattern, it's even better if the matching data can be extracted and used. To extract pieces of information from a match, you can enclose the pattern within parentheses (). To see this in action, let's go back to the original telephone number program that introduced Perl in this chapter. One of the regular expressions extracted the parts of a 10-digit phone number from a string of 10 digits: $phone=~/(\d{3,3})(\d{3,3})(\d{4,4})/; There are three parts to the regular expression, each enclosed within parentheses. The first two parts (\d{3,3}) capture strings of three consecutive digits, and the third part (\d{4,4}) captures the remaining four. For each set of parentheses used in a pattern, a $# variable is created that corresponds to the order in which the parentheses are found. Because the area code is the first set of parentheses in the example, it is $1, the local prefix is $2, and the final four digits are held in $3. Search and ReplaceBecause you can easily find a pattern in a string, wouldn't it be nice if you could replace it with something else? Perl enables you to do just that by writing your regular expression line a little bit differently: $a=~s/<search pattern>/<replace pattern>/ This simple change (adding the s [substitute] flag and a second pattern) enables you to modify data in a variable so that it is exactly what you're expecting removing extraneous data. For example, matching a phone number in the variable $phone and then changing it to a standard format can be accomplished in a single step: $phone=~s/(\d{3,3})(\d{3,3})(\d{4,4})/($1) $2-$3/; A new string in the format (xxx) xxx-xxxx replaces the phone number found in the original string. This enables a programmer to modify data on the fly, transforming user input into a more usable form. Regular expressions are not easy for many people to learn, and a single misplaced character can trip you up. Don't feel bad if you're confused at first; just keep at it. An understanding of regular expressions is important in many languages. And if regular expressions are properly used, they can be a powerful development tool. Implementing Flow ControlFlow control statements give Perl the capability to alter its execution and adapt to different conditions on the fly. Perl uses standard C-like syntax for its looping and conditional constructs. If you've used C or Java before, these statements should look familiar. if-then-elsePerl's if-then-else logic is simple to understand. If a condition is met, a block of code is executed. If the condition is not met, a different piece of programming is run. The syntax for this type of conditional statement is if <expression> { <statements...> } else { <statements...> } For example, to test whether the variable $mycomputer contains the string "Mac OS X" and print Good Choice! if it does, you could write the following: if ($mycomputer=~/mac os x/i) { print "Good Choice!\n"; } else { print "Buy a Mac!\n"; } The curly brackets {} are used to set off code blocks within Perl. The brackets denote the portion of code that a conditional, looping, or subroutine construct applies to. unless-then-elseThe unless statement is syntactically identical to the if-then statement, except that it operates on the inverse of the expression (and uses the word unless rather than if). To change the previous example so that it uses unless, write unless ($mycomputer=~/mac os x/i) { print "Buy a Mac!\n"; } else { print "Good Choice!\n"; } The unless condition is rarely used in Perl applications and is provided mainly as a way to write code in a more readable manner. whileThe while loop enables you to execute while a condition remains true. At the start of each loop, an expression is evaluated; if it returns true, the loop executes. If the loop does not return true, it exits. The syntax for a Perl while loop is while <expression> { <statements> } For example, to monitor a process listing every 30 seconds to see whether the application Terminal is running, the following code fragment could be employed: $processlist=`ps axg`; while (!($processlist=~/terminal/i)) { print "Terminal has not been detected.\n"; sleep 30; $processlist=`ps ax`; } print "The Terminal process is running.\n"; Here the output of the ps axg command is stored in $processlist. This is then searched using a regular expression in the while loop. If the pattern terminal is located, the loop exits, and the message The Terminal process is running. is displayed. If not, the script sleeps for 30 seconds and then tries again. for-nextThe for-next loop is the most fundamental of all looping constructs. This loop iterates through a series of values until a condition (usually a numeric limit) is met. The syntax for a for-next loop is for (<initialization>;<execution condition>;<increment>) { <code block> } The initialization sets up the loop and initializes the counter variable to its default state. The execution condition is checked with each iteration of the loop; if the condition evaluates to false, the loop ends. Finally, the increment is a piece of code that defines an operation performed on the counter variable each time the loop is run. For example, the following loop counts from 0 to 9: for ($count=0;$count<10;$count++) { print "Count = $count"; } The counter, $count, is set to 0 when the loop starts. With each repetition, it is incremented by 1 ($count++). The loop exits when the counter reaches 10 ($count<10). Creating SubroutinesSubroutines help modularize code by dividing it into smaller functional units. Rather than creating a gigantic block of Perl that does everything under the sun, you can create subroutines that are easier to read and debug. A subroutine is started with the sub keyword and the name the subroutine should be called. The body of the subroutine is enclosed in curly brackets {}. For example, here is a simple subroutine that prints Mac OS X Tiger: sub printos { print "Mac OS X Tiger\n"; } You can include subroutines anywhere in your source code and call them at any time by prefixing their name with & (&printos). Subroutines can also be set up to receive values from the main program and return results. For example, this routine accepts two strings and concatenates them together: sub concatenatestring { my ($x,$y)=@_; return ("$x$y"); } To retrieve the concatenation of the strings "Mac" and "OS X", the subroutine would be addressed as $result=&concatenatestring("Mac","OS X"); Data is received by the subroutine through the use of the special variable @_. The two values it contains are then stored in local variables (denoted by the my keyword) named $x and $y. Finally, the return statement returns a concatenated version of the two strings. Expanding Perl Functionality with CPAN ModulesPerl can be extended to offer additional functionality ranging from Internet access to graphics generation. Just about anything you could ever want to do can be done using Perl you just need the right module. The best place to find the right Perl module is CPAN the Comprehensive Perl Archive Network. CPAN contains an ever-increasing list of Perl modules with their descriptions and documentation. To browse CPAN, point your web browser to http://www.cpan.org. There are two ways to install modules located in the CPAN archive. The first is using a built-in Perl module that directly interacts with CPAN from your desktop computer. The second is the traditional method of downloading, unarchiving, and installing just as with any other software. Perl modules are a bit easier to install than most software because the installed code ends up in the Perl directory instead of needing to be placed in a variety of directories across the entire system hierarchy. Two Perl modules (DBI::DBD and DBD::mysql) will be used in Chapter 19 to demonstrate Perl/database integration. Conveniently, this corresponds to the two available installation methods. Let's take a look at both methods now and then put them to practice in Chapter 19. Note: These examples are meant to document the process of installing any module. If you try to follow these instructions without MySQL installed, you will see errors. CPAN InstallationUsing the interactive method of installing Perl modules is as simple as install <module name>. To start the interactive module installation shell, type sudo cpan at a command line. The CPAN installer shell starts: cpan shell -- CPAN exploration and modules installation (v1.70) '' cpan>
At the cpan> prompt, type install <modulename> to begin the installation process. For example, to add the DBI::DBD module: cpan> install DBI::DBD Issuing "/usr/bin/ftp -n" Local directory now /private/var/root/.cpan/sources/modules GOT /var/root/.cpan/sources/modules/03modlist.data.gz ... CPAN: MD5 security checks disabled because MD5 not installed. Please consider installing the MD5 module. ... Installing /usr/bin/dbiproxy Installing /usr/bin/dbish Writing /Library/Perl/darwin/auto/DBI/.packlist Appending installation info to /System/Library/Perl/darwin/perllocal.pod /usr/bin/make install -- OK Depending on your Perl installation and version, you might notice a number of messages pertaining to different Perl modules during the installation. Each time the CPAN shell is used, it checks for new versions of itself. If a new version is found, it provides instructions on how to install the update (install Bundle::CPAN). Don't concern yourself too much about these messages unless the installation fails.
After CPAN has completed the installation process, the module is ready to use there's no need to reboot. The next time you invoke Perl, the module will be available. For more control within the CPAN shell, you can use these additional commands:
Modules that have been downloaded are stored in the .cpan directory within your home directory. Keep track of the size of this directory because it will continue to grow as long as you install new modules. Archive-Based InstallationThe second form of module installation is archive-based. This is almost identical to installing other types of software, so there shouldn't be many surprises here. First, download the package to install from CPAN; in this example, I'm using a package called DBD-mysql, which you'll use in Chapter 19 to access the MySQL database system: % curl -O ftp://ftp.cpan.org/pub/CPAN/modules/ by-module/DBD/DBD-mysql-2.0901.tar.gz Next, unarchive the module: % tar zxf DBD-mysql-2.0901.tar.gz Enter the distribution directory and enter this command: perl Makefile.PL. This automatically configures the package and generates a makefile that you can use to compile and install the module: % perl Makefile.PL This is an experimental version of DBD::mysql. For production environments you should prefer the Msql-Mysql-modules. I will use the following settings for compiling and testing: testpassword (default ) = testhost (default ) = testuser (default ) = nocatchstderr (default ) = 0 libs (mysql_config) = -L/usr/local/lib/mysql -lmysqlclient -lz -lm testdb (default ) = test cflags (Users choice) = -I'/usr/local/mysql/include' To change these settings, see 'perl Makefile.PL --help' and 'perldoc INSTALL'. Using DBI 1.18 installed in /Library/Perl/darwin/auto/DBI Writing Makefile for DBD::mysql
Now, the installation becomes identical to any other software. The same make commands apply. The best step to take next is to type make to compile and then type make test to test the compiled software: % make cc -c -I/Library/Perl/darwin/auto/DBI -I'/usr/local/mysql/include' -g -pipe -pipe -fno-common -DHAS_TELLDIR_PROTOTYPE -fno-strict-aliasing -O3 -DVERSION=\"2.0901\" -DXS_VERSION= \"2.0901\" -I/System/Library/Perl/darwin/CORE dbdimp.c ... % make test t/00base............ok t/10dsnlist.........ok t/20createdrop......ok t/30insertfetch.....ok t/40bindparam.......ok t/40blobs...........ok t/40listfields......ok t/40nulls...........ok t/40numrows.........ok t/50chopblanks......ok t/50commit..........ok, 14/30 skipped: No transactions t/60leaks...........skipped test on this platform t/ak-dbd............ok t/akmisc............ok t/dbdadmin..........ok t/insertid..........ok t/mysql2............ok t/mysql.............ok All tests successful, 1 test and 14 subtests skipped. Files=18, Tests=758, 25 wallclock secs ( 3.59 cusr + 0.35 csys = 3.94 CPU) Finally, type sudo make install to install the Perl module: % sudo make install Skipping /Library/Perl/darwin/auto/DBD/mysql/mysql.bs (unchanged) Installing /Library/Perl/darwin/auto/DBD/mysql/mysql.bundle Files found in blib/arch: installing files in blib/lib into ¬architecture dependent tree ... Installing /usr/share/man/man3/Bundle::DBD::mysql.3 Installing /usr/share/man/man3/DBD::mysql.3 Installing /usr/share/man/man3/DBD::mysql::INSTALL.3 Installing /usr/share/man/man3/Mysql.3 Writing /Library/Perl/darwin/auto/DBD/mysql/.packlist Appending installation info to /System/Library/Perl/darwin/perllocal.pod The module has been installed and is ready to use. Chapter 19 demonstrates how to use these Perl modules to communicate with a MySQL database. Accessing Perl DocumentationRetrieving help information on a Perl function or module is as simple as using the perldoc command. perldoc searches the installed Perl documentation and displays extensive help information on functions, modules, and generalized topics. There are three common forms for using perldoc. The first, perldoc -f <function name>, returns formatted information about a given built-in Perl function, such as open: % perldoc -f open open FILEHANDLE,EXPR open FILEHANDLE,MODE,EXPR open FILEHANDLE,MODE,EXPR,LIST open FILEHANDLE,MODE,REFERENCE open FILEHANDLE Opens the file whose filename is given by EXPR, and associates it with FILEHANDLE. (The following is a comprehensive reference to open(): for a gentler introduction you may consider perlopentut.) If FILEHANDLE is an undefined scalar variable (or array or hash element) the variable is assigned a reference to a new anony- mous filehandle, otherwise if FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. (This is considered a symbolic reference, so "use strict 'refs'" should not be in effect.) ... Next, perldoc -q <faq topic> retrieves information from the Perl FAQ. For example, to retrieve information about regular expressions: % perldoc -q expressions Found in /System/Library/Perl/pods/perlfaq6.pod How can I hope to use regular expressions without creating illegible and unmaintainable code? Three techniques can make regular expressions maintainable and understandable. Comments Outside the Regex Describe what you're doing and how you're doing it, using normal Perl comments. Finally, use perldoc <module name> to retrieve information about an installed Perl module: % perldoc Shell Shell(3) User Contributed Perl Documentation Shell(3) NAME Shell - run shell commands transparently within perl SYNOPSIS See below. ... Table 18.5 provides many of the flags for the perldoc command.
Perl's built-in documentation is an excellent comprehensive reference that provides both usage information as well as complete code examples. Perl Editors and IDEsAlthough writing Perl in emacs or vi is completely acceptable, it is not necessarily a Mac-like experience. A few commercial and shareware editors that support syntax highlighting and direct Perl execution can make life easier for the serious Perl developer.
Additional Perl InformationThe information in this chapter should be enough to get you started authoring and editing Perl scripts. In Chapter 19, you'll learn how to extend Perl to control another free software package: MySQL. In Chapter 24, you'll see how Perl can be used to author online applications. As with many topics in this book, space just isn't available for a completely comprehensive text. If you like what you see, you can learn more about Perl through these resources:
|
< Day Day Up > |