The input and output in a web application usually flow between browser, server, and database, but there are many circumstances in which files are involved too. Files are useful for retrieving remote web pages for local processing, storing data without a database, and saving information that other programs need access to. Plus, as PHP becomes a tool for more than just pumping out web pages, the file I/O functions are even more useful.
PHP's interface for file I/O is similar to that of C, although less complicated. The fundamental unit of identifying a file to read from or write to is a filehandle. This handle identifies your connection to a specific file, and you use it for operations on the file. This chapter focuses on opening and closing files and manipulating filehandles in PHP, as well as what you can do with the file contents once you've opened a file. Chapter 24 deals with directories and file metadata such as permissions.
The code in Example 23-1 opens /tmp/cookie-data and writes the contents of a specific cookie to the file.
Writing data to a file
The function fopen( ) returns a filehandle if its attempt to open the file is successful. If it can't open the file (because of incorrect permissions, for example), it returns false and generates an E_WARNING-type error. Recipes 23.1 through 23.3 cover ways to open files.
In Example 23-1, fwrite( ) writes the value of the flavor cookie to the filehandle. It returns the number of bytes written. If it can't write the string (not enough disk space, for example), it returns -1.
Last, fclose( ) closes the filehandle. This is done automatically at the end of a request, but it's a good idea to explicitly close all files you open anyway. It prevents problems using the code in a command-line context and frees up system resources. It also allows you to check the return code from fclose( ). Buffered data might not actually be written to disk until fclose( ) is called, so it's here that "disk full" errors are sometimes reported.
As with other processes, PHP must have the correct permissions to read from and write to a file. This is usually straightforward in a command-line context but can cause confusion when running scripts within a web server. Your web server (and consequently your PHP script) probably runs as a specific user dedicated to web serving (or perhaps as user nobody). For good security reasons, this user often has restricted permissions on what files it can access. If your script is having trouble with a file operation, make sure the web server's user or group'not yours'has permission to perform that file operation. Some web serving setups may run your script as you, though, in which case you need to make sure that your scripts can't accidentally read or write personal files that aren't part of your web site.
Because most file-handling functions just return false on error, you have to do some additional work to find more details about that error. When the TRack_errors configuration directive is on, each error message is put in the global variable $php_errormsg. Including this variable as part of your error output makes debugging easier, as shown in Example 23-2.
Using file-related error information
If you don't have permission to write to the /tmp/cookie-data, Example 23-2 dies with this error output:
can't open: fopen("/tmp/cookie-data", "w") - Permission denied
Windows and Unix treat files differently. To ensure your file access code works appropriately on Unix and Windows, take care to handle line-delimiter characters and pathnames correctly.
A line delimiter on Windows is two characters: ASCII 13 (carriage return) followed by ASCII 10 (line feed or newline). On Unix, it's just ASCII 10. The typewriter-era names for these characters explain why you can get "stair-stepped" text when printing out a Unix-delimited file. Imagine these character names as commands to the platen in a typewriter or character-at-a-time printer. A carriage return sends the platen back to the beginning of the line it's on, and a line feed advances the paper by one line. A misconfigured printer encountering a Unix-delimited file dutifully follows instructions and does a line feed at the end of each line. This advances to the next line but doesn't move the horizontal printing position back to the left margin. The next stair-stepped line of text begins (horizontally) where the previous line left off.
PHP functions that use a newline as a line-ending delimiter (for example, fgets( )) work on both Windows and Unix because a newline is the character at the end of the line on either platform.
To remove any line-delimiter characters, use the PHP function rtrim( ), as shown in Example 23-3.
Trimming trailing whitespace
This function removes any trailing whitespace in the line, including ASCII 13 and ASCII 10 (as well as tab and space). If there's whitespace at the end of a line that you want to preserve, but you still want to remove carriage returns and line feeds, provide rtrim( ) with a string containing the characters that it should remove. Other characters are left untouched. This is shown in Example 23-4.
Trimming trailing line-ending characters
Unix and Windows also differ on the character used to separate directories in pathnames. Unix uses a slash (/), and Windows uses a backslash (\). PHP makes sorting this out easy, however, because the Windows version of PHP also understands / as a directory separator. For example, Example 23-5 successfully prints the contents of C:\Alligator\Crocodile Menu.txt.
Using forward slashes on Windows
Example 23-5 also takes advantage of the fact that Windows filenames aren't case-sensitive. However, Unix filenames are.
Sorting out line-break confusion isn't only a problem in your code that reads and writes files but in your source code files as well. If you have multiple people working on a project, make sure all developers configure their editors to use the same kind of line breaks.
Once you've opened a file, PHP gives you many tools to process its data. In keeping with PHP's C-like I/O interface, the two basic functions to read data from a file are fread( ), which reads a specified number of bytes, and fgets( ), which reads a line at a time (up to an optional specified number of bytes). Example 23-6 handles lines up to 256 bytes long.
Reading lines from a file
If orders.txt has a 300-byte line, fgets( ) returns only the first 256 bytes. The next fgets( ) returns the next 44 bytes and stops when it finds the newline. The next fgets( ) after that moves to the next line of the file. Without the second argument, fgets( ) reads until it reaches the end of the line. (With PHP versions before 4.2.0, a line length is required. From PHP 4.2.0 up to 4.3.0, the length defaults to 1,024 if not specified.)
Many operations on file contents, such as picking a line at random (see Recipe 23.8) are conceptually simpler (and require less code) if the entire file is read into a string or array. The file_get_contents( ) function reads an entire file into a string, and the file( ) function puts each line of a file into an array. The trade-off for simplicity, however, is memory consumption. This can be especially harmful when you are using PHP as a server module. Generally, when a process (such as a web server process with PHP embedded in it) allocates memory (as PHP does to read an entire file into a string or array), it can't return that memory to the operating system until it dies. This means that calling file_get_contents( ) on a 1 MB file from PHP running as an Apache module increases the size of that Apache process by 1 MB until the process dies. Repeated a few times, this decreases server efficiency. There are certainly good reasons for processing an entire file at once, but be conscious of the memory-use implications when you do.
Recipes 23.17 through 23.19 deal with running other programs from within a PHP program. Some program execution operators or functions offer ways to run a program and read its output all at once (backticks) or read its last line of output (system( )). PHP can use pipes to run a program, pass it input, or read its output. Because a pipe is read with standard I/O functions (fgets( ) and fread( )), you decide how you want the input and you can do other tasks between reading chunks of input. Similarly, writing to a pipe is done with fputs( ) and fwrite( ), so you can pass input to a program in arbitrary increments.
Pipes have the same permission issues as regular files. The PHP process must have execute permission on the program being opened as a pipe. If you have trouble opening a pipe, especially if PHP is running as a special web server user, make sure the user is allowed to execute the program to which you are opening a pipe.