Section 10.2. Performing Higher-Level Data Access

10.1. Working with Files and Directories

When we say file, we usually mean a disk file, though not always. We do use the concept of a file as a meaningful abstraction in Ruby as in other programming languages. When we say directory, we mean a directory in the normal Windows or UNIX sense.

The File class is closely related to the IO class from which it inherits. The Dir class is not so closely related, but we chose to discuss files and directories together because they are still conceptually related.

10.1.1. Opening and Closing Files

The class method File.new, which instantiates a File object also opens that file. The first parameter is naturally the filename.

The optional second parameter is called the mode string, telling how to open the file (whether for reading, writing, and so on). (The mode string has nothing to do with the mode as in permissions.) This defaults to "r" for reading. The following code demonstrates opening files for reading and writing.

file1 = File.new("one")       # Open for reading file2 = File.new("two", "w")  # Open for writing

Another form for new takes three parameters. In this case, the second parameter specifies the original permissions for the file (usually as an octal constant), and the third is a set of flags ORed together. The flags are constants such as File::CREAT (create the file when it is opened if it doesn't already exist) and File::RDONLY (open for reading only). This form will rarely be used.

file = File.new("three", 0755, File::CREAT|File::WRONLY)

As a courtesy to the operating system and the runtime environment, always close a file that you open. In the case of a file open for writing, this is more than mere politeness and can actually prevent lost data. Not surprisingly, the close method serves this purpose:

out = File.new("captains.log", "w") # Process as needed... out.close

There is also an open method. In its simplest form, it is merely a synonym for new as we see here:

trans = File.open("transactions","w")

But open can also take a block; this is the form that is more interesting. When a block is specified, the open file is passed in as a parameter to the block. The file remains open throughout the scope of the block and is closed automatically at the end. For example:

File.open("somefile","w") do |file|   file.puts "Line 1"   file.puts "Line 2"   file.puts "Third and final line" end # The file is now closed

This is obviously an elegant way of ensuring that a file is closed when we've finished with it. In addition, the code that handles the file is grouped visually into a unit.

10.1.2. Updating a File

Suppose that we want to open a file for reading and writing. This is done simply by adding a plus sign (+) in the file mode when we open the file (see section 10.1.1, "Opening and Closing Files"):

f1 = File.new("file1", "r+") # Read/write, starting at beginning of file. f2 = File.new("file2", "w+") # Read/write; truncate existing file or create a new one. f3 = File.new("file3", "a+") # Read/write; start at end of existing file or create a # new one.

10.1.3. Appending to a File

Suppose that we want to append information onto an existing file. This is done simply by using "a" in the file mode when we open the file (see section 10.1.1, "Opening and Closing Files"):

logfile = File.open("captains_log", "a") # Add a line at the end, then close. logfile.puts "Stardate 47824.1: Our show has been canceled." logfile.close

10.1.4. Random Access to Files

If you want to read a file randomly rather than sequentially, you can use the method seek, which File inherits from IO. The simplest usage is to seek to a specific byte position. The position is relative to the beginning of the file, where the first byte is numbered 0.

# myfile contains only: abcdefghi file = File.new("myfile") file.seek(5) str = file.gets                   # "fghi"

If you took care to ensure that each line was a fixed length, you could seek to a specific line, as in the following example:

# Assume 20 bytes per line. # Line N starts at byte (N-1)*20 file = File.new("fixedlines") file.seek(5*20)                   # Sixth line! # Elegance is left as an exercise.

If you want to do a relative seek, you can use a second parameter. The constant IO::SEEK_CUR assumes that the offset is relative to the current position (which may be negative).

file = File.new("somefile") file.seek(55)                 # Position is 55 file.seek(-22, IO::SEEK_CUR)  # Position is 33 file.seek(47, IO::SEEK_CUR)   # Position is 80

You can also seek relative to the end of the file. Only a negative offset makes sense here:

file.seek(-20, IO::SEEK_END)  # twenty bytes from eof

There is also a third constant IO::SEEK_SET, but it is the default value (seek relative to beginning of file).

The method tell reports the file position; pos is an alias:

file.seek(20) pos1 = file.tell             # 20 file.seek(50, IO::SEEK_CUR) pos2 = file.pos              # 70

The rewind method can also be used to reposition the file pointer at the beginning. This terminology comes from the use of magnetic tapes.

If you are performing random access on a file, you may want to open it for update (reading and writing). Updating a file is done by specifying a plus sign (+) in the mode string. See section 10.1.2, "Updating a File."

10.1.5. Working with Binary Files

In days gone by, C programmers used the "b" character appended to the mode string to open a file as a binary. (Contrary to popular belief, this was true of UNIX in earlier versions.) This character is still allowed for compatibility in most cases; but nowadays binary files are not so tricky as they used to be. A Ruby string can easily hold binary data, and a file need not be read in any special way.

The exception is the Windows family of operating systems, where this distinction still survives. The chief difference between binary and text files on these platforms is that in binary mode, the end-of-line is not translated into a single linefeed but is kept as a carriage-return/linefeed pair.

The other important difference is that control-Z is treated as end-of-file if the file is not opened in binary mode, as shown here:

# Create a file (in binary mode) File.open("myfile","wb") {|f| f.syswrite("12345\0326789\r") } # Above note the embedded octal 032 (^Z) # Read it as binary str = nil File.open("myfile","rb") {|f| str = f.sysread(15) } puts str.size           # 11 # Read it as text str = nil File.open("myfile","r") {|f| str = f.sysread(15) } puts str.size           # 5

The following code fragment shows that carriage returns remain untranslated in binary mode on Windows:

# Input file contains a single line: Line 1. file = File.open("data") line = file.readline             # "Line 1.\n" puts "#{line.size} characters."  # 8 characters file.close file = File.open("data","rb") line = file.readline             # "Line 1.\r\n" puts "#{line.size} characters."  # 9 characters file.close

Note that the binmode method, shown in the following code example, can switch a stream to binary mode. Once switched, it cannot be switched back.

file = File.open("data") file.binmode line = file.readline             # "Line 1.\r\n" puts "#{line.size} characters."  # 9 characters file.close

If you really want to do low-level input/output, you can use the sysread and syswrite methods. The former takes a number of bytes as a parameter; the latter takes a string and returns the actual number of bytes written. (You should not use other methods to read from the same stream; the results may be unpredictable.)

input = File.new("infile") output = File.new("outfile") instr = input.sysread(10); bytes = output.syswrite("This is a test.")

Note that sysread raises EOFError if it is invoked at end of file (though not if it encounters end of file during a successful read). Either of these methods will raise SystemCallError when an error occurs.

Note that the Array method pack and the String method unpack can be useful in dealing with binary data.

10.1.6. Locking Files

On operating systems where it is supported, the flock method of File will lock or unlock a file. The second parameter is one of these constants: File::LOCK_EX, File::LOCK_NB, File::LOCK_SH, File::LOCK_UN, or a logical-OR of two or more of these. Note, of course, that many of these combinations will be nonsensical; the non-blocking flag is the one most frequently used.

file = File.new("somefile") file.flock(File::LOCK_EX)  # Lock exclusively; no other process                            # may use this file. file.flock(File::LOCK_UN)  # Now unlock it. file.flock(File::LOCK_SH)  # Lock file with a shared lock (other                            # processes may do the same). file.flock(File::LOCK_UN)  # Now unlock it. locked = file.flock(File::LOCK_EX | File::LOCK_NB) # Try to lock the file, but don't block if we can't; in that case, # locked will be false.

This function is not available on the Windows family of operating systems.

10.1.7. Performing Simple I/O

You are already familiar with some of the I/O routines in the Kernel module; these are the ones we have called all along without specifying a receiver for the methods. Calls such as gets and puts originate here; others are print, printf, and p (which calls the object's inspect method to display it in some way readable to humans).

There are some others that we should mention for completeness, though. The putc method outputs a single character. (The corresponding method getc is not implemented in Kernel for technical reasons; it can be found in any IO object, however.) If a String is specified, the first character of the string will be taken.

putc(?\n)   # Output a newline putc("X")   # Output the letter X

A reasonable question is where does output go when we use these methods without a receiver. Well, to begin with, three constants are defined in the Ruby environment corresponding to the three standard I/O streams we are accustomed to on UNIX. These are STDIN, STDOUT, and STDERR. All are global constants of the type IO.

There is also a global variable called $stdout, which is the destination of all the output coming from Kernel methods. This is initialized (indirectly) to the value of STDOUT so that this output all gets written to standard output as we expect. The variable $stdout can be reassigned to refer to some other IO object at any time.

diskfile = File.new("foofile","w") puts "Hello..."      # prints to stdout $stdout = diskfile puts "Goodbye!"      # prints to "foofile" diskfile.close $stdout = STDOUT     # reassign to default puts "That's all."   # prints to stdout

Beside gets, Kernel also has methods readline and readlines for input. The former is equivalent to gets except that it raises EOFError at the end of a file instead of just returning a nil value. The latter is equivalent to the IO.readlines method (that is, it reads an entire file into memory).

Where does input come from? Well, there is also the standard input stream $stdin, which defaults to STDIN. In the same way, there is a standard error stream ($stderr defaulting to STDERR).

There is also an interesting global object called ARGF, which represents the concatenation of all the files named on the command line. It is not really a File object, though it resembles one. Default input is connected to this object in the event files are named on the command line.

# Read all files, then output again puts ARGF.read # Or more memory-efficient: while ! ARGF.eof?   puts ARGF.readline end # Example:  ruby cat.rb file1 file2 file3

Reading from standard input (STDIN) will bypass the Kernel methods. That way you can bypass ARGF (or not), as shown here:

# Read a line from standard input str1 =  STDIN.gets # Read a line from ARGF str2 = ARGF.gets # Now read again from standard input str3 =  STDIN.gets

10.1.8. Performing Buffered and Unbuffered I/O

Ruby does its own internal buffering in some cases. Consider this fragment:

print "Hello... " sleep 10 print "Goodbye!\n"

If you run this, you will notice that the hello and goodbye messages both appear at the same time, after the sleep. The first output is not terminated by a newline.

This can be fixed by calling flush to flush the output buffer. In this case we use the stream $defout (the default stream for all Kernel method output) as the receiver. It then behaves as we probably wanted, with the first message appearing earlier than the second one.

print "Hello... " STDOUT.flush sleep 10 print "Goodbye!\n"

This buffering can be turned off (or on) with the sync= method; the sync method lets us know the status.

buf_flag = $defout.sync    # true STDOUT.sync = false buf_flag = STDOUT.sync     # false

There is also at least one lower level of buffering going on behind the scenes. Just as the getc method returns a character and moves the file or stream pointer, so ungetc will push a character back onto the stream.

ch = mystream.getc    # ?A mystream.ungetc(?C) ch = mystream.getc    # ?C

You should be aware of three things. First, the buffering we speak of here is unrelated to the buffering mentioned earlier in this section; in other words, sync=false won't turn it off. Second, only one character can be pushed back; if you attempt more than one, only the last one will actually be pushed back onto the input stream. Finally, the ungetc method will not work for inherently unbuffered read operations (such as sysread).

10.1.9. Manipulating File Ownership and Permissions

The issue of file ownership and permissions is highly platform dependent. Typically UNIX provides a superset of the functionality; for other platforms many features may be unimplemented.

To determine the owner and group of a file (which are integers), File::Stat has a pair of instance methods uid and gid as shown here:

data = File.stat("somefile") owner_id = data.uid group_id = data.gid

Class File::Stat has an instance method mode, which will return the mode (or permissions) of the file.

perms = File.stat("somefile").mode

File has class and instance methods named chown to change the owner and group IDs of a file. The class method accepts an arbitrary number of filenames. Where an ID is not to be changed, nil or -1 can be used.

uid = 201 gid = 10 File.chown(uid, gid, "alpha", "beta") f1 = File.new("delta") f1.chown(uid, gid) f2 = File.new("gamma") f2.chown(nil, gid)      # Keep original owner id

Likewise, the permissions can be changed by chmod (also implemented both as class and instance methods). The permissions are traditionally represented in octal, though they need not be.

File.chmod(0644, "epsilon", "theta") f = File.new("eta") f.chmod(0444)

A process always runs under the identity of some user (possibly root); as such, there is a user id associated with it. (Here we are talking about the effective user ID.) We frequently need to know whether that user has permission to read, write, or execute a given file. There are instance methods in File::Stat to make this determination.

info = File.stat("/tmp/secrets") rflag = info.readable? wflag = info.writable? xflag = info.executable?

Sometimes we need to distinguish between the effective user ID and the real user ID. The appropriate instance methods are readable_real?, writable_real?, and executable_real?, respectively.

info = File.stat("/tmp/secrets") rflag2 = info.readable_real? wflag2 = info.writable_real? xflag2 = info.executable_real?

We can test the ownership of the file as compared with the effective user ID (and group ID) of the current process. The class File::Stat has instance methods owned? and grpowned? to accomplish this.

Note that many of these methods can also be found in the module FileTest:

    rflag = FileTest::readable?("pentagon_files")     # Other methods are: writable? executable? readable_real? writable_real?     # executable_real? owned? grpowned?     # Not found here: uid gid mode

The umask associated with a process determines the initial permissions of new files created. The standard mode 0777 is logically ANDed with the negation of the umask so that the bits set in the umask are "masked" or cleared. If you prefer, you can think of this as a simple subtraction (without borrow). Thus a umask of 022 results in files being created with a mode of 0755.

The umask can be retrieved or set with the class method umask of class File. If a parameter is specified, the umask will be set to that value (and the previous value will be returned).

File.umask(0237)             # Set the umask current_umask = File.umask   # 0237

Some file mode bits (such as the sticky bit) are not strictly related to permissions. For a discussion of these, see section 10.1.12, "Checking Special File Characteristics."

10.1.10. Retrieving and Setting Time Stamp Information

Each disk file has multiple time stamps associated with it (though there are some variations between operating systems). The three time stamps that Ruby understands are the modification time (the last time the file contents were changed), the access time (the last time the file was read), and the change time (the last time the file's directory information was changed).

These three pieces of information can be accessed in three different ways. Each of these fortunately gives the same results.

The File class methods mtime, atime, and ctime return the times without the file being opened or any File object being instantiated.

t1 = File.mtime("somefile") # Thu Jan 04 09:03:10 GMT-6:00 2001 t2 = File.atime("somefile") # Tue Jan 09 10:03:34 GMT-6:00 2001 t3 = File.ctime("somefile") # Sun Nov 26 23:48:32 GMT-6:00 2000

If there happens to be a File instance already created, the instance method can be used.

myfile = File.new("somefile") t1 = myfile.mtime t2 = myfile.atime t3 = myfile.ctime

And if there happens to be a File::Stat instance already created, it has instance methods to do the same thing.

myfile = File.new("somefile") info = myfile.stat t1 = info.mtime t2 = info.atime t3 = info.ctime

Note that a File::Stat is returned by File's class (or instance) method stat. The class method lstat (or the instance method of the same name) is identical except that it reports on the status of the link itself instead of following the link to the actual file. In the case of links to links, all links are followed but the last one.

File access and modification times may be changed using the utime method. It will change the times on one or more files specified. The times may be given either as Time objects or a number of seconds since the epoch.

today = Time.now yesterday = today - 86400 File.utime(today, today, "alpha") File.utime(today, yesterday, "beta", "gamma")

Because both times are changed together, if you want to leave one of them unchanged, you have to save it off first.

mtime = File.mtime("delta") File.utime(Time.now, mtime, "delta")

10.1.11. Checking File Existence and Size

One fundamental question we sometimes want to know is whether a file of a given name exists. The exist? method in the FileTest module provides a way to find out:

flag = FileTest::exist?("LochNessMonster") flag = FileTest::exists?("UFO") # exists? is a synonym for exist?

Intuitively, such a method could not be a class instance of File because by the time the object is instantiated the file has been opened; File conceivably could have a class method exist?, but in fact it does not.

Related to the question of a file's existence is the question of whether it has any contents. After all, a file may exist but have zero length (which is the next best thing to not existing).

If we are only interested in this yes/no question, File::Stat has two instance methods that are useful. The method zero? returns TRue if the file is zero length and false otherwise:

flag = File.new("somefile").stat.zero?

Conversely, the method size? returns either the size of the file in bytes if it is nonzero length, or the value nil if it is zero length. It may not be immediately obvious why nil is returned rather than 0. The answer is that the method is primarily intended for use as a predicate, and 0 is true in Ruby, whereas nil tests as false.

if File.new("myfile").stat.size?   puts "The file has contents." else   puts "The file is empty." end

Methods zero? and size? also appear in the FileTest module:

flag1 = FileTest::zero?("file1") flag2 = FileTest::size?("file2")

This leads naturally to the question "How big is this file?" We've already seen that in the case of a nonempty file, size? returns the length; but if we're not using it as a predicate, the nil value would confuse us.

The File class has a class method (but not an instance method) to give us this answer. The instance method of the same name is inherited from the IO class, and File::Stat has a corresponding instance method.

size1 = File.size("file1") size2 = File.stat("file2").size

If we want the file size in blocks rather than bytes, we can use the instance method blocks in File::Stat. This is certainly dependent on the operating system. (The method blksize also reports on the operating system's idea of how big a block is.)

info = File.stat("somefile") total_bytes = info.blocks * info.blksize

10.1.12. Checking Special File Characteristics

There are numerous aspects of a file that we can test. We summarize here the relevant built-in methods that we don't discuss elsewhere. Most, though not all, are predicates.

Bear in mind two facts throughout this section (and most of this chapter). First, because File mixes in FileTest, any test that can be done by invoking the method qualified with the module name may also be called as an instance method of any file object. Second, remember that there is a high degree of overlap between the FileTest module and the File::Stat object returned by stat (or lstat). In some cases, there will be three different ways to call what is essentially the same method. We won't necessarily show this every time.

Some operating systems have the concept of block-oriented devices as opposed to character-oriented devices. A file may refer to either but not both. The methods blockdev? and chardev? in the FileTest module tests for this:

flag1 = FileTest::chardev?("/dev/hdisk0")  # false flag2 = FileTest::blockdev?("/dev/hdisk0") # true

Sometimes we want to know whether the stream is associated with a terminal. The IO class method tty? tests for this (as does the synonym isatty):

flag1 = STDIN.tty?                   # true flag2 = File.new("diskfile").isatty  # false

A stream can be a pipe or a socket. There are corresponding FileTest methods to test for these cases:

flag1 = FileTest::pipe?(myfile) flag2 = FileTest::socket?(myfile)

Recall that a directory is really just a special case of a file. So we need to be able to distinguish between directories and ordinary files, which a pair of FileTest methods enable us to do.

file1 = File.new("/tmp") file2 = File.new("/tmp/myfile") test1 = file1.directory?          # true test2 = file1.file?               # false test3 = file2.directory?          # false test4 = file2.file?               # true

There is also a File class method named ftype, which tells us what kind of thing a stream is; it can also be found as an instance method in the File::Stat class. This method returns a string that has one of the following values: file, directory, blockSpecial, characterSpecial, fifo, link, or socket. (The string fifo refers to a pipe.)

this_kind = File.ftype("/dev/hdisk0")     # "blockSpecial" that_kind = File.new("/tmp").stat.ftype   # "directory"

Certain special bits may be set or cleared in the permissions of a file. These are not strictly related to the other bits that we discuss in section 10.1.9, "Manipulating File Ownership and Permissions". These are the set-group-id bit, the set-user-id bit, and the sticky bit. There are methods in FileTest for each of these.

file = File.new("somefile") info = file.stat sticky_flag = info.sticky? setgid_flag = info.setgid? setuid_flag = info.setuid?

A disk file may have symbolic or hard links that refer to it (on operating systems supporting these features). To test whether a file is actually a symbolic link to some other file, use the symlink? method of FileTest. To count the number of hard links associated with a file, use the nlink method (found only in File::Stat). A hard link is virtually indistinguishable from an ordinary file; in fact, it is an ordinary file that happens to have multiple names and directory entries.

File.symlink("yourfile","myfile")           # Make a link is_sym = FileTest::symlink?("myfile")       # true hard_count = File.new("myfile").stat.nlink  # 0

Incidentally, note that in the previous example we used the File class method symlink to create a symbolic link.

In rare cases, you may want even lower-level information about a file. The File::Stat class has three more instance methods that give you the gory details. The method dev gives you an integer identifying the device on which the file resides, rdev returns an integer specifying the kind of device, and for disk files, ino gives you the starting inode number for the file.

file = File.new("diskfile") info = file.stat device = info.dev devtype = info.rdev inode = info.ino

10.1.13. Working with Pipes

There are various ways of reading and writing pipes in Ruby. The class method IO.popen opens a pipe and hooks the process's standard input and standard output into the IO object returned. Frequently we will have different threads handling each end of the pipe; here we just show a single thread writing and then reading:

check = IO.popen("spell","r+") check.puts("'T was brillig, and the slithy toves") check.puts("Did gyre and gimble in the wabe.") check.close_write list = check.readlines list.collect! { |x| x.chomp } # list is now %w[brillig gimble gyre slithy toves wabe]

Note that the close_write call is necessary. If it were not issued, we would not be able to reach the end of file when we read the pipe.

There is a block form that works as follows:

File.popen("/usr/games/fortune") do |pipe|   quote = pipe.gets   puts quote   # On a clean disk, you can seek forever. - Thomas Steel end

If the string "-" is specified, a new Ruby instance is started. If a block is specified with this, the block is run as two separate processes rather like a fork; the child gets nil passed into the block, and the parent gets an IO object with the child's standard input and/or output connected to it.

IO.popen("-") do |mypipe|   if mypipe     puts "I'm the parent: pid = #{Process.pid}"     listen = mypipe.gets     puts listen   else     puts "I'm the child: pid = #{Process.pid}"   end end # Prints: #   I'm the parent: pid = 10580 #   I'm the child: pid = 10582

A pipe method also returns a pair of pipe ends connected to each other. In the following code example, we create a pair of threads and let one pass a message to the other (the first message that Samuel Morse sent over the telegraph). Refer to Chapter 13, "Threads in Ruby" if this aspect confuses you.

pipe = IO.pipe reader = pipe[0] writer = pipe[1] str = nil thread1 = Thread.new(reader,writer) do |reader,writer|   # writer.close_write   str = reader.gets   reader.close end thread2 = Thread.new(reader,writer) do |reader,writer|   # reader.close_read   writer.puts("What hath God wrought?")   writer.close end thread1.join thread2.join puts str         # What hath God wrought?

10.1.14. Performing Special I/O Operations

It is possible to do lower-level I/O in Ruby. We will only mention the existence of these methods; if you need to use them, some of them will be highly machine-specific anyway (varying even between different versions of UNIX).

The ioctl method ("I/O control") accepts two arguments. The first is an integer specifying the operation to be done. The second is either an integer or a string representing a binary number.

The fcntl method is also for low-level control of file-oriented streams in a system-dependent manner. It takes the same kinds of parameters as ioctl.

The select method (in the Kernel module) accepts up to four parameters; the first is the read-array, and the last three are optional (write-array, error-array, and the timeout value). When input is available from one or more devices in the read-array, or when one or more devices in the write-array are ready, the call returns an array of three elements representing the respective arrays of devices that are ready for I/O.

The Kernel method syscall takes at least one integer parameter (and up to nine string or integer parameters in all). The first parameter specifies the I/O operation to be done.

The fileno method returns an old-fashioned file descriptor associated with an I/O stream. This is the least system-dependent of all the methods mentioned here.

desc = $stderr.fileno      # 2

10.1.15. Using Nonblocking I/O

Ruby makes a concerted effort "behind the scenes" to ensure that I/O does not block. For this reason, it is possible in most cases to use Ruby threads to manage I/Oa single thread may block on an I/O operation while another thread goes on processing.

This is a little counterintuitive. Ruby's threads are all in the same process because they are not native threads. Your expectation then might be that a blocking I/O operation would block the entire process and all the threads associated with it. The reason it doesn't work this way is that Ruby manages its I/O carefully in a way transparent to the programmer.

However, those who want to turn off nonblocking I/O can do so. The small library io/nonblock provides a simple setter, a query method, and a block-oriented setter for an IO object.

require 'io/nonblock' # ... test = mysock.nonblock?         # false mysock.nonblock = true          # turn off blocking # ... mysock.nonblock = false         # turn on again mysock.nonblock { some_operation(mysock) } # Perform some_operation with nonblocking set to true mysock.nonblock(false) { other_operation(mysock) } # Perform other_operation with non-blocking set to false

10.1.16. Using `readpartial`

The readpartial method is a relatively new method designed to make I/O easier in certain circumstances. It is designed to be used on a stream such as a socket.

The "max length" parameter is required. If the buffer parameter is specified, it should refer to a string where the data will be stored.

data = sock.readpartial(128)  # Read at most 128 bytes

The readpartial method doesn't honor the nonblocking flag. It will sometimes block, but only when three conditions are true: The IO object's buffer is empty; the stream content is empty; and the stream has not yet reached an end-of-file condition.

So in effect, if there is data in the stream, readpartial will not block. It will read up to the maximum number of bytes specified, but if there are fewer bytes available, it will grab those and continue.

If the stream has no data, but it is at end of file, readpartial will immediately raise an EOFError.

If the call blocks, it waits until either it receives data or it detects an EOF condition. If it receives data, it simply returns it. If it detects EOF, it raises an EOFError.

When sysread is called in blocking mode, its behavior is similar to the way readpartial works. If the buffer is empty, their behavior is identical.

10.1.17. Manipulating Pathnames

In manipulating pathnames, the first things to be aware of are the class methods File.dirname and File.basename; these work like the UNIX commands of the same name and return the directory name and the filename, respectively. If an extension is specified as a second parameter to basename, that extension will be removed.

str = "/home/dave/podbay.rb" dir = File.dirname(str)           # "/home/dave" file1 = File.basename(str)        # "podbay.rb" file2 = File.basename(str,".rb")  # "podbay"

Note that although these are methods of File, they are really simply doing string manipulation.

A comparable method is File.split, which returns these two components (directory and filename) in a two-element array.

info = File.split(str)        # ["/home/dave","podbay.rb"]

The expand_path class method expands a relative pathname, converting to an absolute path. If the operating system understands such idioms as ~ and ~user, these will be expanded also.

Dir.chdir("/home/poole/personal/docs") abs = File.expand_path("../../misc")    # "/home/poole/misc"

Given an open file, the path instance method returns the pathname used to open the file.

file = File.new("../../foobar") name = file.path                 # "../../foobar"

The constant File::Separator gives the character used to separate pathname components (typically backslash for Windows, slash for UNIX). An alias is File::SEPARATOR.

The class method join uses this separator to produce a path from a list of directory components:

path = File.join("usr","local","bin","someprog") # path is "usr/local/bin/someprog" # Note that it doesn't put a separator on the front!

Don't fall into the trap of thinking that File.join and File.split are somehow inverses. They're not.

10.1.18. Using the `Pathname` Class

You should also be aware of the standard library pathname, which gives us the Pathname class. This is essentially a wrapper for Dir, File, FileTest, and FileUtils; as such, it has much of the functionality of these, unified in a way that is supposed to be logical and intuitive.

path = Pathname.new("/home/hal") file = Pathname.new("file.txt") p2 = path + file path.directory?         # true path.file?              # false p2.directory?           # false p2.file?                # true parts = path2.split     # [Pathname:/home/hal, Pathname:file.txt] ext = path2.extname     # .txt

There are also a number of convenience methods as you would expect. The root? method attempts to detect whether a path refers to the root directory; it can be "fooled" because it merely analyzes the string and does not access the filesystem. The parent? method returns the pathname of this path's parent. The children method returns a list of the next-level children below this path; it includes both files and directories but is not recursive.

p1 = Pathname.new("//")           # odd but legal p1.root?                          # true p2 = Pathname.new("/home/poole") p3 = p2.parent                    # Pathname:/home items = p2.children               # array of Pathnames (all files and                                   # dirs immediately under poole)

As you would expect, relative and absolute try to determine whether a path is relative (by looking for a leading slash):

p1 = Pathname.new("/home/dave") p1.absolute?                      # true p1.relative?                      # false

Many methods such as size, unlink, and others are actually delegated to File, FileTest, and FileUtils; the functionality is not reimplemented.

For more details on Pathname, consult ruby-doc.org or any good reference.

10.1.19. Command-Level File Manipulation

Often we need to manipulate files in a manner similar to the way we would at a command line. That is, we need to copy, delete, rename, and so on.

Many of these capabilities are built-in methods; a few are in the FileUtils module in the fileutils library. Be aware that FileUtils used to mix functionality directly into the File class by reopening it; now these methods stay in their own module.

To delete a file, we can use File.delete or its synonym File.unlink:

File.delete("history") File.unlink("toast")

To rename a file, we can use File.rename as follows

File.rename("Ceylon","SriLanka")

File links (hard and symbolic) can be created using File.link and File.symlink, respectively:

File.link("/etc/hosts","/etc/hostfile")   # hard link File.symlink("/etc/hosts","/tmp/hosts")   # symbolic link

We can truncate a file to zero bytes (or any other specified number) by using the truncate instance method:

File.truncate("myfile",1000)    # Now at most 1000 bytes

Two files may be compared by means of the compare_file method. There is an alias named cmp (and there is also compare_stream).

require "fileutils" same = FileUtils.compare_file("alpha","beta")  # true

The copy method will copy a file to a new name or location. It has an optional flag parameter to write error messages to standard error. The UNIX-like name cp is an alias.

require "fileutils" # Copy epsilon to theta and log any errors. FileUtils.copy("epsilon","theta", true)

A file may be moved with the move method (alias mv). Like copy, it also has an optional verbose flag.

require "fileutils" FileUtils.move("/tmp/names","/etc")     # Move to new directory FileUtils.move("colours","colors")      # Just a rename

The safe_unlink method deletes the specified file or files, first trying to make the files writable so as to avoid errors. If the last parameter is true or false, that value will be taken as the verbose flag.

require "fileutils" FileUtils.safe_unlink("alpha","beta","gamma") # Log errors on the next two files FileUtils.safe_unlink("delta","epsilon",true)

Finally, the install method basically does a syscopy, except that it first checks that the file either does not exist or has different content.

require "fileutils" FileUtils.install("foo.so","/usr/lib") # Existing foo.so will not be overwritten # if it is the same as the new one.

For more on FileUtils, consult ruby-doc.org or any other reference.

10.1.20. Grabbing Characters from the Keyboard

Here we use the term grabbing because we sometimes want to process a character as soon as it is pressed rather than buffer it and wait for a newline to be entered.

This can be done in both UNIX variants and Windows variants. Unfortunately, the two methods are completely unrelated to each other.

The UNIX version is straightforward. We use the well-known technique of putting the terminal in raw mode (and we usually turn off echoing at the same time).

def getchar   system("stty raw -echo")  # Raw mode, no echo   char = STDIN.getc   system("stty -raw echo")  # Reset terminal mode   char end

In the Windows world, we would need to write a C extension for this. An alternative for now is to use a small feature of the Win32API library.

require 'Win32API' def getchar   char = Win32API.new("crtdll", "_getch", [], 'L').Call end

In either case the behavior is effectively the same.

10.1.21. Reading an Entire File into Memory

To read an entire file into an array, you need not even open the file. The method IO.readlines will do this, opening and closing the file on its own.

arr = IO.readlines("myfile") lines = arr.size puts "myfile has #{lines} lines in it." longest = arr.collect {|x| x.length}.max puts "The longest line in it has #{longest} characters."

We can also use IO.read (which returns a single large string rather than an array of lines).

str = IO.read("myfile") bytes = arr.size puts "myfile has #{bytes} bytes in it." longest = str.collect {|x| x.length}.max     # strings are enumerable! puts "The longest line in it has #{longest} characters."

Obviously because IO is an ancestor of File, we can say File.readlines and File.read just as easily.

10.1.22. Iterating Over a File by Lines

To iterate over a file a line at a time, we can use the class method IO.foreach or the instance method each. In the former case, the file need not be opened in our code.

# Print all lines containing the word "target" IO.foreach("somefile") do |line|   puts line if line =~ /target/ end # Another way... file = File.new("somefile") file.each do |line|   puts line if line =~ /target/ end

Note that each_line is an alias for each.

10.1.23. Iterating Over a File by Byte

To iterate a byte at a time, use the each_byte instance method. Remember that it feeds a character (that is, an integer) into the block; use the chr method if you need to convert to a "real" character.

file = File.new("myfile") e_count = 0 file.each_byte do |byte|   e_count += 1 if byte == ?e end

10.1.24. Treating a String As a File

Sometimes people want to know how to treat a string as though it were a file. The answer depends on the exact meaning of the question.

An object is defined mostly in terms of its methods. The following code shows an iterator applied to an object called source; with each iteration, a line of output is produced. Can you tell the type of source by reading this fragment?

source.each do |line|   puts line end

Actually, source could be a file, or it could be a string containing embedded newlines. So in cases like these, a string can trivially be treated as a file.

In newer versions of Ruby, the stringio standard library makes this possible.

This StringIO implementation has an interface virtually identical to the implementation shown in the first edition of this book. It also has a string accessor that refers to the contents of the string itself.

require 'stringio' ios = StringIO.new("abcdefghijkl\nABC\n123") ios.seek(5) ios.puts("xyz") puts ios.tell             # 8 puts ios.string.dump      # "abcdexyzijkl\nABC\n123" c = ios.getc puts "c = #{c}"           # c = 105 ios.ungetc(?w) puts ios.string.dump      # "abcdexyzwjkl\nABC\n123" puts "Ptr = #{ios.tell}" s1 = ios.gets             # "wjkl" s2 = ios.gets             # "ABC"

10.1.25. Reading Data Embedded in a Program

When you were twelve years old and you learned BASIC by copying programs out of magazines, you may have used a DATA statement for convenience. The information was embedded in the program, but it could be read as if it originated outside.

Should you ever want to, you can do much the same thing in Ruby. The directive __END__ at the end of a Ruby program signals that embedded data follow. This can be read using the global constant DATA, which is an IO object like any other. (Note that the __END__ marker must be at the beginning of the line on which it appears.)

# Print each line backwards... DATA.each_line do |line|   puts line.reverse end __END__ A man, a plan, a canal... Panama! Madam, I'm Adam. ,siht daer nac uoy fI .drah oot gnikrow neeb ev'uoy

10.1.26. Reading Program Source

Suppose you wanted to access the source of your own program. This can be done using a variation on a trick we used elsewhere (see section 10.1.25 "Reading Data Embedded in a Program").

The global constant DATA is an IO object that refers to the data following the __END__ directive. But if you do a rewind operation, it resets the file pointer to the beginning of the program source.

The following program produces a listing of itself with line numbers. It is not particularly useful, but maybe you can find some other good use for this capability.

DATA.rewind num = 1 DATA.each_line do |line|   puts "#{'%03d' % num}  #{line}"   num += 1 end __END__

Note that the __END__ directive is necessary; without it, DATA cannot be accessed at all.

10.1.27. Working with Temporary Files

There are many circumstances in which we need to work with files that are all but anonymous. We don't want to trouble with naming them or making sure there is no name conflict, and we don't want to bother with deleting them.

All these issues are addressed in the Tempfile library. The new method (alias open) takes a basename as a seed string and concatenates onto it the process id and a unique sequence number. The optional second parameter is the directory to be used; it defaults to the value of environment variables TMPDIR, TMP, or TEMP, and finally the value "/tmp".

The resulting IO object may be opened and closed many times during the execution of the program. Upon termination of the program, the temporary file will be deleted.

The close method has an optional flag; if set to TRue, the file will be deleted immediately after it is closed (instead of waiting until program termination). The path method returns the actual pathname of the file, should you need it.

require "tempfile" temp = Tempfile.new("stuff") name = temp.path              # "/tmp/stuff17060.0" temp.puts "Kilroy was here" temp.close # Later... temp.open str = temp.gets               # "Kilroy was here" temp.close(true)              # Delete it NOW

10.1.28. Changing and Setting the Current Directory

The current directory may be determined by the use of Dir.pwd or its alias Dir.getwd; these abbreviations historically stand for print working directory and get working directory, respectively. In a Windows environment, the backslashes probably show up as normal (forward) slashes.

The method Dir.chdir may be used to change the current directory. On Windows, the logged drive may appear at the front of the string.

Dir.chdir("/var/tmp") puts Dir.pwd           # "/var/tmp" puts Dir.getwd         # "/var/tmp"

This method also takes a block parameter. If a block is specified, the current directory is changed only while the block is executed (and restored afterward):

Dir.chdir("/home") Dir.chdir("/tmp") do   puts Dir.pwd        # /tmp   # other code... end puts Dir.pwd          # /home

10.1.29. Changing the Current Root

On most UNIX variants, it is possible to change the current process's idea of where root or "slash" is. This is typically done for security reasonsfor example, when running unsafe or untested code. The chroot method sets the new root to the specified directory.

Dir.chdir("/home/guy/sandbox/tmp") Dir.chroot("/home/guy/sandbox") puts Dir.pwd                     # "/tmp"

10.1.30. Iterating Over Directory Entries

The class method foreach is an iterator that successively passes each directory entry into the block. The instance method each behaves the same way.

Dir.foreach("/tmp") { |entry| puts entry } dir = Dir.new("/tmp") dir.each  { |entry| puts entry }

Both of the preceding code fragments print the same output (the names of all files and subdirectories in /tmp).

10.1.31. Getting a List of Directory Entries

The class method Dir.entries returns an array of all the entries in the specified directory.

list = Dir.entries("/tmp")  # %w[. .. alpha.txt beta.doc]

As shown in the preceding code, the current and parent directories are included. If you don't want these, you'll have to remove them manually.

10.1.32. Creating a Chain of Directories

Sometimes we want to create a chain of directories where the intermediate directories themselves don't necessarily exist yet. At the UNIX command line, we would use mkdir -p for this.

In Ruby code, we can do this by using the FileUtils.makedirs method (from the fileutils library):

require "fileutils" FileUtils.makedirs("/tmp/these/dirs/need/not/exist")

10.1.33. Deleting a Directory Recursively

In the UNIX world, we can type rm -rf dir at the command line, and the entire subtree starting with dir will be deleted. Obviously, we should exercise caution in doing this.

In recent versions of Ruby, Pathname has a method rmtree that will accomplish this. There is also a method rm_r in FileUtils that will do the same.

require 'pathname' dir = Pathname.new("/home/poole/") dir.rmtree # or: require 'fileutils' FileUtils.rm_r("/home/poole")

10.1.34. Finding Files and Directories

Here we use the standard library find.rb to create a method that finds one or more files and returns the list of files as an array. The first parameter is the starting directory; the second is either a filename (that is, a string) or a regular expression.

require "find" def findfiles(dir, name)   list = []   Find.find(dir) do |path|     Find.prune if [".",".."].include? path     case name       when String         list << path if File.basename(path) == name       when Regexp         list << path if File.basename(path) =~ name     else       raise ArgumentError     end   end   list end findfiles "/home/hal", "toc.txt" # ["/home/hal/docs/toc.txt", "/home/hal/misc/toc.txt"] findfiles "/home", /^[a-z]+.doc/ # ["/home/hal/docs/alpha.doc", "/home/guy/guide.doc", #  "/home/bill/help/readme.doc"]