Working with Files and Directories

 
   

Ruby Way
By Hal Fulton
Slots : 1.0
Table of Contents
 


When we say "file," we usually mean a disk file, although not always. We do use the concept of a file as a meaningful abstraction in Ruby as in other programming languages. When we say "directory," we mean a directory in the normal Windows or Unix sense.

The File class is closely related to the IO class from which it inherits. The Dir class is not so closely related, but we chose to discuss files and directories together because they are still conceptually related.

Opening and Closing Files

The class method File.new, which instantiates a File object, will also open that file. The first parameter is naturally the filename.

The optional second parameter is called the mode string, which tells how to open the file (for reading, writing, or whatever). The mode string has nothing to do with the mode as in permissions. This defaults to "r" for reading. Here's an example:

 

 file1 = File.new("one")       # Open for reading file2 = File.new("two", "w")  # Open for writing 

There is another form for new that takes three parameters. In this case, the second parameter specifies the original permissions for the file (usually as an octal constant), and the third is a set of flags ORed together. The flags are constants such as File::CREAT (create the file when it is opened if it doesn't already exist) and File::RDONLY (open for reading only). This form will rarely be used. Here's an example:

 

 file = File.new("three", 0755, File::CREAT|File::WRONLY) 

As a courtesy to the operating system and the runtime environment, always close a file that you open. In the case of a file open for writing, this is more than mere politeness and can actually prevent lost data. Not surprisingly, the close method will serve this purpose:

 

 out = File.new("captains.log", "w") # Process as needed... out.close 

There is also an open method. In its simplest form, it is merely a synonym for new, as shown here:

 

 trans = File.open("transactions","w") 

However, open can also take a block; this is the form that is more interesting. When a block is specified, the open file is passed in as a parameter to the block. The file remains open throughout the scope of the block and is closed automatically at the end. Here's an example:

 

 File.open("somefile","w") do |file|   file.puts "Line 1"   file.puts "Line 2"   file.puts "Third and final line" end # The file is now closed 

This is obviously an elegant way of ensuring that a file is closed when we've finished with it. In addition, the code that handles the file is grouped visually into a unit.

Updating a File

Suppose we want to open a file for reading and writing. This is done simply by adding a plus sign (+) in the file mode when we open the file (see the section titled "Opening and Closing Files" for more information). Here's an example:

 

 f1 = File.new("file1", "r+")     # Read/write, starting at beginning of file.     f2 = File.new("file2", "w+")     # Read/write; truncate existing file or create a new one.     f3 = File.new("file3", "a+")     # Read/write; start at end of existing file or create a     # new one. 

Appending to a File

Suppose we want to append information onto an existing file. This is done simply by using "a" in the file mode when we open the file (see the section titled "Opening and Closing Files" for more information). Here's an example:

 

 logfile = File.open("captains_log", "a")     # Add a line at the end, then close.     logfile.puts "Stardate 47824.1: Our show has been canceled."     logfile.close 

Random Access to Files

If you want to read a file randomly rather than sequentially, you can use the method seek, which File inherits from IO. The simplest usage is to seek to a specific byte position. The position is relative to the beginning of the file, where the first byte is numbered 0. Here's an example:

 

 # myfile contains only: abcdefghi file = File.new("myfile") file.seek(5) str = file.gets    # "fghi" 

If you took care to ensure that each line is a fixed length, you could seek to a specific line, like so:

 

 # Assume 20 bytes per line. # Line N starts at byte (N-1)*20 file = File.new("fixedlines") file.seek(5*20)                   # Sixth line! # Elegance is left as an exercise. 

If you want to do a relative seek, you can use a second parameter. The constant IO::SEEK_CUR will assume the offset is relative to the current position (which may be negative). Here's an example:

 

 file = File.new("somefile") file.seek(55)                 # Position is 55 file.seek(-22, IO::SEEK_CUR)  # Position is 33 file.seek(47, IO::SEEK_CUR)   # Position is 80 

You can also seek relative to the end of the file. Only a negative offset makes sense here:

 

 file.seek(-20, IO::SEEK_END)  # twenty bytes from eof 

There is also a third constant, IO::SEEK_SET, but it is the default value (seek relative to the beginning of file).

The method tell will report the file position (pos is an alias):

 

 file.seek(20) pos1 = file.tell             # 20 file.seek(50, IO::SEEK_CUR) pos2 = file.pos              # 70 

The rewind method can also be used to reposition the file pointer at the beginning. This terminology comes from the use of magnetic tapes.

If you are performing random access on a file, you may want to open it for updating (reading and writing). Updating a file is done by specifying a plus sign (+) in the mode string. See the section titled "Updating a File" for more information.

Working with Binary Files

In days gone by, C programmers would use the "b" character appended to the mode string in order to open a file as a binary. This character is still allowed for compatibility in most cases, but nowadays binary files are not as tricky as they used to be. A Ruby string can easily hold binary data, and a file need not be read in any special way.

The exception is the Windows family of operating systems. The chief difference between binary and text files on these platforms is that in binary mode, the end-of-line is not translated into a single linefeed but is kept as a carriage-return/linefeed pair.

The "b" character is indeed used in this circumstance:

 

 # Input file contains a single line: Line 1. file = File.open("data") line = file.readline             # "Line 1.\n" puts "#{ line.size}  characters."  # 8 characters file.close file = File.open("data","rb") line = file.readline             # "Line 1.\r\n" puts "#{ line.size}  characters."  # 9 characters file.close 

Note that the binmode method can switch a stream to binary mode. Once switched, it cannot be switched back. Here's an example:

 

 file = File.open("data") file.binmode line = file.readline             # "Line 1.\r\n" puts "#{ line.size}  characters."  # 9 characters file.close 

If you really want to do low-level input/output, you can use the sysread and syswrite methods. The former takes a number of bytes as a parameter; the latter takes a string and returns the actual number of bytes written. (You should not use other methods to read from the same stream; the results may be unpredictable.) Here's an example:

 

 input = File.new("infile") output = File.new("outfile") instr = input.sysread(10); ytes = output.syswrite("This is a test.") 

Note that sysread raises EOFError at end-of-file. Either of these methods will raise SystemCallError when an error occurs.

Note that the Array method pack and the String method unpack can be very useful in dealing with binary data.

Locking Files

On operating systems where it is supported, the flock method of File will lock or unlock a file. The second parameter is one of these constants: File::LOCK_EX, File::LOCK_NB, File::LOCK_SH, File::LOCK_UN, or a logical OR of two or more of these. Note, of course, that many of these combinations will be nonsensical; primarily the nonblocking flag will be ORed in if anything is. Here's an example:

 

 file = File.new("somefile") file.flock(File::LOCK_EX)  # Lock exclusively; no other process                            # may use this file. file.flock(File::LOCK_UN)  # Now unlock it. file.flock(File::LOCK_SH)  # Lock file with a shared lock (other                            # processes may do the same). file.flock(File::LOCK_UN)  # Now unlock it. locked = file.flock(File::LOCK_EX | File::LOCK_NB) # Try to lock the file, but don't block if we can't; in that case, # locked will be false. 

Performing Simple I/O

You are already familiar with some of the I/O routines in the Kernel module; these are the ones we have called all along without specifying a receiver for the methods. Calls such as gets and puts originate here; others are print, printf, and p (which calls the object's inspect method to display it in some way readable to humans).

There are some others that we should mention for completeness, though. For example, the putc method will output a single character. (The corresponding method getc is not implemented in Kernel for technical reasons; it can be found in any IO object, however.) If a String is specified, the first character of the string will be taken. Here's an example:

 

 putc(?\n)   # Output a newline putc("X")   # Output the letter X 

A reasonable question is, where does output go when we use these methods without a receiver? Well, to begin with, three constants are defined in the Ruby environment corresponding to the three standard I/O streams we are accustomed to in Unix. These are STDIN, STDOUT, and STDERR. All are global constants of the type IO.

There is also a global variable called $defout that is the destination of all the output coming from Kernel methods. This is initialized (indirectly) to the value of STDOUT so that this output all gets written to standard output as we expect. The variable $defout can be reassigned to refer to some other IO object at any time. Here's an example:

 

 diskfile = File.new("foofile","w") puts "Hello..."  # prints to stdout $defout = diskfile puts "Goodbye!"  # prints to "foofile" diskfile.close 

Besides gets, Kernel also has the methods readline and readlines for input. The former is equivalent to gets except that it raises EOFError at the end of a file instead of just returning a nil value. The latter is equivalent to the IO.readlines method (that is, it reads an entire file into memory).

Where does input come from? Well, there is also the standard input stream $stdin, which defaults to STDIN. In the same way, there is a standard error stream ($stderr defaulting to STDERR).

Also, an interesting global object called ARGF represents the concatenation of all the files named on the command line. It is not really a File object, although it resembles one. Standard input is connected to this object in the event files are named on the command line.

Performing Buffered and Unbuffered I/O

Ruby does its own internal buffering in some cases. Consider this fragment:

 

 print "Hello... " sleep 10 print "Goodbye!\n" 

If you run this, you will notice that the hello and goodbye messages both appear at the same time, after the sleep. The first output is not terminated by a newline.

This can be fixed by calling flush to flush the output buffer. In this case, we use the stream $defout (the default stream for all Kernel method output) as the receiver. It then behaves as we probably wanted, with the first message appearing earlier than the second one:

 

 print "Hello... " $defout.flush sleep 10 print "Goodbye!\n" 

This buffering can be turned off (or on) with the sync= method; the sync method will let us know the status:

 

 buf_flag = $defout.sync    # true $defout.sync = false buf_flag = $defout.sync    # false 

Also, at least one lower level of buffering is going on behind the scenes. Just as the getc method returns a character and moves the file or stream pointer, the ungetc method pushes a character back onto the stream:

 

 ch = mystream.getc    # ?A mystream.ungetc(?C) ch = mystream.getc    # ?C 

You should be aware of three things here. First of all, the buffering we're speaking of is unrelated to the buffering we mentioned earlier in this section; in other words, sync=false won't turn it off. Second, only one character can be pushed back; if you attempt more than one, only the last character will actually be pushed back onto the input stream. Finally, the ungetc method will not work for inherently unbuffered read operations (such as sysread).

Manipulating File Ownership and Permissions

The issue of file ownership and permissions is highly platform dependent. Typically, Unix provides a superset of the functionality; for other platforms many features may be unimplemented.

To determine the owner and group of a file (which are integers), File::Stat has a pair of instance methods, uid and gid, as shown here:

 

 data = File.stat("somefile") owner_id = data.uid group_id = data.gid 

Class File::Stat has the instance method mode, which will return the mode (or permissions) of the file:

 

 perms = File.stat.mode("somefile") 

File has class and instance methods named chown to change the owner and group IDs of a file. The class method will accept an arbitrary number of filenames. Where an ID is not to be changed, nil or -1 can be used. Here's an example:

 

 uid = 201 gid = 10 File.chown(uid, gid, "alpha", "beta") f1 = File.new("delta") f1.chown(uid, gid) f2 = File.new("gamma") f2.chown(nil, gid)      # Keep original owner id 

Likewise, the permissions can be changed by chmod (also implemented both as class and instance methods). The permissions are traditionally represented in octal format, although they need not be:

 

 File.chmod(0644, "epsilon", "theta") f = File.new("eta") f.chmod(0444) 

A process always runs under the identity of some user (possibly root); as such, a user ID is associated with it. (Here we are talking about the effective user ID.) We frequently need to know whether that user has permission to read, write, or execute a given file. There are instance methods in File::Stat to make this determination, as shown here:

 

 info = File.stat("/tmp/secrets") rflag = info.readable? wflag = info.writable? xflag = info.executable? 

Sometimes we need to distinguish between the effective user ID and the real user ID. The appropriate instance methods are readable_real?, writable_real?, and executable_real?. Here's an example:

 

 info = File.stat("/tmp/secrets") rflag2 = info.readable_real? wflag2 = info.writable_real? xflag2 = info.executable_real? 

We can test the ownership of the file as compared with the effective user ID (and group ID) of the current process. The class File::Stat has instance methods owned? and grpowned? to accomplish this.

Note that many of these methods can also be found in the module FileTest:

 

 rflag = FileTest::readable?("pentagon_files") # Other methods are: writable? executable? readable_real? writable_real? # executable_real? owned? grpowned? # Not found here: uid gid mode 

The "umask" associated with a process determines the initial permissions of new files created. The standard mode 0777 is logically ANDed with the negation of the umask so that the bits set in the umask are "masked" or cleared. If you prefer, you can think of this as a simple subtraction (without borrow). Therefore, a umask of 022 will result in files being created with a mode of 0755.

The umask can be retrieved or set with the class method umask of class File. If there is a parameter specified, the umask will be set to that value (and the previous value will be returned):

 

 File.umask(0237)             # Set the umask current_umask = File.umask   # 0237 

Some file mode bits (such as the "sticky" bit) are not strictly related to permissions. For a discussion of these, see the section titled "Checking Special File Characteristics."

Retrieving and Setting Timestamp Information

Each disk file has multiple timestamps associated with it (although there are some variations between operating systems). The three timestamps that Ruby understands are the modification time (the last time the file contents were changed), the access time (the last time the file was read), and the change time (the last time the file's directory information was changed).

These three pieces of information can be accessed in three different ways. Each of these fortunately gives the same results.

The File class methods mtime, atime, and ctime will return the times without the file being opened or any File object being instantiated:

 

 t1 = File.mtime("somefile") # Thu Jan 04 09:03:10 GMT-6:00 2001 t2 = File.atime("somefile") # Tue Jan 09 10:03:34 GMT-6:00 2001 t3 = File.ctime("somefile") # Sun Nov 26 23:48:32 GMT-6:00 2000 

If there happens to be a File instance already created, the instance method can be used:

 

 myfile = File.new("somefile") t1 = myfile.mtime t2 = myfile.atime t3 = myfile.ctime 

And if there happens to be a File::Stat instance already created, it has instance methods to do the same thing:

 

 myfile = File.new("somefile") info = myfile.stat t1 = info.mtime t2 = info.atime t3 = info.ctime 

Note that a File::Stat is returned by File's class (or instance) method stat. The class method lstat (or the instance method of the same name) is identical except that it reports on the status of the link itself instead of following the link to the actual file. In the case of links to links, all links are followed but the last one.

File access and modification times may be changed using the utime method. It will change the times on one or more files specified. The times may be given either as Time objects or as a number of seconds since the epoch. Here's an example:

 

 today = Time.now yesterday = today - 86400 File.utime(today, today, "alpha") File.utime(today, yesterday, "beta", "gamma") 

Because both times are changed together, if you want to leave one of them unchanged, you have to save it off first, as shown here:

 

 mtime = File.mtime("delta") File.utime(Time.now, mtime, "delta") 

Checking File Existence and Size

One fundamental question we sometimes want to know about a file is whether the file of the given name exists. The exist? method in the FileTest module provides a way to find out:

 

 flag = FileTest::exist?("LochNessMonster") flag = FileTest::exists?("UFO") # exists? is a synonym for exist? 

Intuitively, such a method could not be a class instance of File because by the time the object is instantiated, the file has been opened; File conceivably could have a class method exist?, but in fact it does not.

Related to the question of a file's existence is the question of whether it has any contents. After all, a file may exist but have zero length (which is the next best thing to not existing).

If we are only interested in this yes/no question, File::Stat has two instance methods that are useful. The method zero? will return true if the file is zero length; otherwise, it will return false:

 

 flag = File.new("somefile").stat.zero? 

Conversely, the method size? will return either the size of the file in bytes, if it is nonzero length, or the value nil, if it is zero length. It may not be immediately obvious why nil is returned rather than 0. The answer is that the method is primarily intended for use as a predicate, and 0 is "true" in Ruby, whereas nil tests as "false." Here's an example:

 

 if File.new("myfile").stat.size?   puts "The file has contents." else   puts "The file is empty." end 

The methods zero? and size? also appear in the FileTest module:

 

 flag1 = FileTest::zero?("file1") flag2 = FileTest::size?("file2") 

This leads naturally to the question, how big is this file? We've already seen that in the case of a nonempty file, size? will return the length, but if we're not using it as a predicate, the nil value would confuse us.

The File class has a class method (but not an instance method) to give us this answer. The instance method of the same name is inherited from the IO class, and File::Stat has a corresponding instance method:

 

 size1 = File.size("file1") size2 = File.stat("file2").size 

If we want the file size in blocks rather than bytes, we can use the instance method blocks in File::Stat. This is certainly dependent on the operating system. (The method blksize will also report on the operating system's idea of how big a block is.) Here's an example:

 

 info = File.stat("somefile") total_bytes = info.blocks * info.blksize 

Checking Special File Characteristics

There are numerous aspects of a file that we can test. We summarize here the relevant built-in methods that we don't discuss elsewhere. Most, though not all, are predicates.

Bear in mind two facts throughout this section (and most of this chapter). First of all, because File mixes in FileTest, any test that can be done by invoking the method qualified with the module name may also be called as an instance method of any file object. Second, remember that there is a high degree of overlap between the FileTest module and the File::Stat object returned by stat (or lstat). In some cases, there will be three different ways to call what is essentially the same method. We won't necessarily show this every time.

Some operating systems have the concept of block-oriented devices as opposed to character-oriented devices. A file may refer to neither, but not both. The methods blockdev? and chardev? in the FileTest module will test for this:

 

 flag1 = FileTest::chardev?("/dev/hdisk0")  # false flag2 = FileTest::blockdev?("/dev/hdisk0") # true 

Sometimes we want to know whether the stream is associated with a terminal. The IO class method tty? tests for this (as will the synonym isatty):

 

 flag1 = STDIN.tty?                   # true flag2 = File.new("diskfile").isatty  # false 

A stream can be a pipe or a socket. There are corresponding FileTest methods to test for these cases:

 

 flag1 = FileTest::pipe?(myfile) flag2 = FileTest::socket?(myfile) 

Recall that a directory is really just a special case of a file. Therefore, we need to be able to distinguish between directories and ordinary files, which a pair of FileTest methods enable us to do. Here's an example:

 

 file1 = File.new("/tmp") file2 = File.new("/tmp/myfile") test1 = file1.directory?          # true test2 = file1.file?               # false test3 = file2.directory?          # false test4 = file2.file?               # true 

Also, a File class method named ftype will tell us what kind of thing a stream is; it can also be found as an instance method in the File::Stat class. This method returns a string that has one of the following values: file, directory, blockSpecial, characterSpecial, fifo, link, or socket. (The string fifo refers to a pipe.) Here's an example:

 

 this_kind = File.ftype("/dev/hdisk0")     # "blockSpecial" that_kind = File.new("/tmp").stat.ftype   # "directory" 

Certain special bits may be set or cleared in the permissions of a file. These are not strictly related to the other bits we discuss in the section "Manipulating File Ownership and Permissions." These are the set-group-id bit, the set-user-id bit, and the sticky bit. There are methods in FileTest for each of these, as shown here:

 

 file = File.new("somefile") info = file.stat sticky_flag = info.sticky? setgid_flag = info.setgid? setuid_flag = info.setuid? 

A disk file may have symbolic or hard links that refer to it (on operating systems supporting these features). To test whether a file is actually a symbolic link to some other file, use the symlink? method of FileTest. To count the number of hard links associated with a file, use the nlink method (found only in File::Stat). A hard link is virtually indistinguishable from an ordinary file; in fact, it is an ordinary file that happens to have multiple names and directory entries. Here's an example:

 

 File.symlink("yourfile","myfile")           # Make a link is_sym = FileTest::symlink?("myfile")       # true hard_count = File.new("myfile").stat.nlink  # 0 

Incidentally, note that in this example we use the File class method symlink to create a symbolic link.

In rare cases, you may want even lower-level information about a file. The File::Stat class has three more instance methods that give you the gory details. The method dev will give you an integer identifying the device on which the file resides; rdev will return an integer specifying the kind of device; and for disk files, ino will give you the starting "inode" number for the file:

 

 file = File.new("diskfile") info = file.stat device = info.dev devtype = info.rdev inode = info.ino 

Working with Pipes

Ruby provides various ways of reading and writing pipes. The class method IO.popen will open a pipe and hook the process's standard input and standard output into the IO object returned. Frequently we will have different threads handling each end of the pipe; here we just show a single thread writing and then reading:

 

 check = IO.popen("spell","r+") check.puts("'T was brillig, and the slithy toves") check.puts("Did gyre and gimble in the wabe.") check.close_write list = check.readlines list.collect! {  |x| x.chomp } # list is now %w[brillig gimble gyre slithy toves wabe] 

Note that the close_write call is necessary. If it were not issued, we would not be able to reach end-of-file when we read the pipe.

There is also a block form:

 

 File.popen("/usr/games/fortune") do |pipe|   quote = pipe.gets   puts quote   # On a clean disk, you can seek forever. - Thomas Steel end 

If the string "-" is specified, a new Ruby instance is started. If a block is specified with this, the block is run as two separate processes rather like a fork; the child gets nil passed into the block, and the parent gets an IO object with the child's standard input and/or output connected to it, as shown here:

 

 IO.popen("-") do |mypipe|   if mypipe     puts "I'm the parent: pid = #{ Process.pid} "     listen = mypipe.gets     puts listen   else     puts "I'm the child: pid = #{ Process.pid} "   end end # Prints: #   I'm the parent: pid = 10580 #   I'm the child: pid = 10582 

Also, the pipe method returns a pair of pipe ends connected to each other. Here, we create a pair of threads and let one pass a message to the other (the first message that Samuel Morse sent over the telegraph):

 

 pipe = IO.pipe reader = pipe[0] writer = pipe[1] str = nil thread1 = Thread.new(reader,writer) do |reader,writer|   # writer.close_write   str = reader.gets   reader.close end thread2 = Thread.new(reader,writer) do |reader,writer|   # reader.close_read   writer.puts("What hath God wrought?")   writer.close end thread1.join thread2.join puts str         # What hath God wrought? 

Performing Special I/O Operations

It is possible to do lower-level I/O in Ruby. We will only mention the existence of these methods; if you need to use them, note that some of them will be highly machine specific (varying even between different versions of Unix).

The ioctl method ("I/O control") will accept two arguments. The first is an integer specifying the operation to be done. The second is either an integer or a string representing a binary number.

The fcntl method is also for low-level control of file-oriented streams in a system-dependent manner. It takes the same kinds of parameters as ioctl.

The select method (in the Kernel module) will accept up to four parameters; the first is the read-array, and the last three are optional (write-array, error-array, and the timeout value. When input is available from one or more devices in the read-array, or one or more devices in the write-array are ready, the call will return an array of three elements representing the respective arrays of devices that are ready for I/O.

The Kernel method syscall takes at least one integer parameter (and up to nine string or integer parameters in all). The first parameter specifies the I/O operation to be done.

The fileno method returns an old-fashioned file descriptor associated with an I/O stream. This is the least system dependent of all the methods mentioned. Here's an example:

 

 desc = $stderr.fileno      # 2 

Manipulating Pathnames

When manipulating pathnames, the first things to be aware of are the class methods File.dirname and File.basename; these work like the Unix commands of the same names and return the directory name and the filename, respectively. If an extension is specified as a second parameter to basename, that extension will be removed:

 

 str = "/home/dave/podbay.rb" dir = File.dirname(str)           # "/home/dave" file1 = File.basename(str)        # "podbay.rb" file2 = File.basename(str,".rb")  # "podbay" 

Note that although these are methods of File, they are really simply doing string manipulation.

A comparable method is File.split, which returns these two components (directory and filename) in a two-element array:

 

 info = File.split(str)        # ["/home/dave","podbay.rb"] 

The expand_path class method will expand a relative pathname, converting it to an absolute path. If the operating system understands such idioms as ~ and ~user, these will be expanded also. Here's an example:

 

 Dir.chdir("/home/poole/personal/docs") abs = File.expand_path("../../misc")    # "/home/poole/misc" 

Given an open file, the path instance method will return the pathname used to open the file:

 

 file = File.new("../../foobar") name = file.path                 # "../../foobar" 

The constant File::Separator gives the character used to separate pathname components (typically a backslash for Windows and a slash for Unix). An alias is File::SEPARATOR.

The class method join uses this separator to produce a path from a list of directory components:

 

 path = File.join("usr","local","bin","someprog") # path is "usr/local/bin/someprog" # Note that it doesn't put a separator on the front! 

Don't fall into the trap of thinking that File.join and File.split are somehow inverse operations. They're not.

Command-Level File Manipulation

Very often we need to manipulate files in a manner similar to the way we would at a command line. That is, we need to copy, delete, rename, and so on. Many of these capabilities are built-in methods; a few are added by the ftools library.

To delete a file, we can use File.delete or its synonym File.unlink, like so:

 

 File.delete("history") File.unlink("toast") 

To rename a file, we can use File.rename as follows:

 

 File.rename("Ceylon","SriLanka") 

File links (hard and symbolic) can be created using File.link and File.symlink, respectively:

 

 File.link("/etc/hosts","/etc/hostfile")   # hard link File.symlink("/etc/hosts","/tmp/hosts")   # symbolic link 

We can truncate a file to zero bytes (or any other specified number) by using the truncate instance method:

 

 File.truncate("myfile",1000)    # Now at most 1000 bytes 

Two files may be compared by means of the compare method (cmp is the alias):

 

 require "ftools" same = File.compare("alpha","beta")  # true 

The method syscopy will efficiently copy a file to a new name or location. A similar method is copy, which has an optional flag parameter to write error messages to standard error (cp is the alias):

 

 require "ftools" File.syscopy("gamma","delta")      # Copies gamma to delta File.syscopy("gamma","/tmp")       # Creates /tmp/gamma # Copy epsilon to theta and log any errors. File.copy("epsilon","theta", true) 

A file may be moved with the move method (the alias is mv). Like copy, it also has an optional verbose-flag:

 

 require "ftools" File.move("/tmp/names","/etc")     # Move to new directory File.move("colours","colors")      # Just a rename 

The safe_unlink method will delete the specified file or files, first trying to make the files writable so as to avoid errors. If the last parameter is true or false, that value will be taken as the verbose-flag. Here's an example:

 

 require "ftools" File.safe_unlink("alpha","beta","gamma") # Log errors on the next two files File.safe_unlink("delta","epsilon",true) 

Finally, the install method basically does a syscopy, except that it first checks that the file either does not exist or has different content:

 

 require "ftools" File.install("foo.so","/usr/lib") # Existing foo.so will not be overwritten # if it is the same as the new one. 

Grabbing Characters from the Keyboard

We use the term "grabbing" here because we sometimes want to process a character as soon as it is pressed rather than buffer it and wait for a newline to be entered.

This can be done in both Unix variants and Windows variants. Unfortunately, the two methods are completely unrelated to each other.

The Unix version is straightforward. We use the well-known technique of putting the terminal in raw mode (and we usually turn off echoing at the same time):

 

 system("stty raw -echo")  # Raw mode, no echo char = STDIN.getc system("stty -raw echo")  # Reset terminal mode 

In the Windows world, we need to write a C extension for this. An alternative for now is to use a small feature of the Win32API library, shown here:

 

 require 'Win32API' char = Win32API.new("crtdll", "_getch", [], 'L').Call 

This is obviously not pretty, but it works.

Reading an Entire File into Memory

To read an entire file into an array, you need not even open the file. The method IO.readlines will do this, opening and closing the file on its own:

 

 arr = IO.readlines("myfile") lines = arr.size puts "myfile has #{ lines}  lines in it." longest = arr.collect { |x| x.length} .max puts "The longest line in it has #{ longest}  characters." 

Iterating over a File by Line

To iterate over a file a line at a time, we can use the class method IO.foreach or the instance method each. In the former case, the file need not be opened in our code. Here's an example:

 

 # Print all lines containing the word "target" IO.foreach("somefile") do |line|   puts line if line =~ /target/ end # Another way... file = File.new("somefile") file.each do |line|   puts line if line =~ /target/ end 

Note that each_line is an alias for each.

Iterating over a File by Byte

To iterate a byte at a time, use the each_byte instance method. Remember that it feeds a character (that is, an integer) into the block; use the chr method if you need to convert to a "real" character. Here's an example:

 

 file = File.new("myfile") e_count = 0 file.each_byte do |byte|   e_count += 1 if byte == ?e end 

Treating a String As a File

Sometimes people want to know how to treat a string as though it were a file. The answer depends on the exact meaning of the question.

An object is defined mostly in terms of its methods. The following code shows an iterator applied to an object called source; with each iteration, a line of output is produced. Can you tell the type of source by reading this fragment?

 

 source.each do |line|   puts line end 

Actually, source could be a file, or it could be a string containing embedded newlines. Therefore, in cases like these, a string can trivially be treated as a file.

This leads naturally to the idea of writing something like an IOString class. We could do that here, but the exact design of such a class would depend on what you want to do with it. Should we inherit from String or from IO, for instance? A third possibility would be to create a library that adds methods to the String class (just as ftools extends File).

We won't attempt any full implementation here. We'll only make a skeleton to show one approach (see Listing 4.1). Fleshing it out would involve an exhaustive set of methods and rigorous error checking; the example here is neither complete nor robust.

Listing 4.1 Outline of an IOString Class
 class IOString < String   def initialize(str="")     @fptr = 0     self.replace(str)   end   def open     @fptr = 0   end   def truncate     self.replace("")     @fptr = 0   end   def seek(n)     @fptr = [n, self.size].min   end   def tell     @fptr   end   def getc     @fptr += 1     self[@fptr-1].to_i   end   def ungetc(c)     self[@fptr -= 1] = c.chr   end   def putc(c)     self[@fptr] = c.chr     @fptr += 1   end   def gets     s = ""     n = self.index("\n",@fptr)     s = self[@fptr..n].dup     @fptr += s.length     s   end   def puts(s)     self[@fptr..@fptr+s.length-1] = s     @fptr += s.length   end end ios = IOString.new("abcdefghijkl\nABC\n123") ios.seek(5) ios.puts("xyz") puts ios.tell      # 8 puts ios.dump      # "abcdexyzijkl\nABC\n123" c = ios.getc puts "c = #{ c} "    # c = 105 ios.ungetc(?w) puts ios.dump      # "abcdexyzwjkl\nABC\n123" puts "Ptr = #{ ios.tell} " s1 = ios.gets      # "wjkl" s2 = ios.gets      # "ABC" 

Reading Data Embedded in a Program

When you were 12 years old and you learned BASIC by copying programs out of magazines, you may have used a DATA statement for convenience. The information was embedded in the program, but it could be read as if it originated outside.

Should you ever want to, you can do much the same thing in Ruby. The directive __END__ at the end of a Ruby program signals that embedded data follows. This can be read using the global constant DATA, which is an IO object like any other. (Note that the __END__ marker must be at the beginning of the line on which it appears.) Here's an example:

 

 # Print each line backwards... DATA.each_line do |line|   puts line.reverse end __END__ A man, a plan, a canal... Panama! Madam, I'm Adam. ,siht daer nac uoy fI .drah oot gnikrow neeb ev'uoy 

Reading Program Source

Suppose you want to access the source of your own program. This can be done using a variation on a trick we used elsewhere (see the section titled "Reading Data Embedded in a Program").

The global constant DATA is an IO object that refers to the data following the __END__ directive. However, if you perform a rewind operation, it will reset the file pointer to the beginning of the program source.

The following program will produce a listing of itself with line numbers (it is not particularly useful, but maybe you can find some other good use for this capability):

 

 DATA.rewind num = 1 DATA.each_line do |line|   puts "#{ '%03d' % num}   #{ line} "   num += 1 end __END__ 

Note that the __END__ directive is necessary; without it, DATA cannot be accessed at all.

Performing Newline Conversion

One of the annoyances of dealing with different operating systems is that they may have different concepts of what an end-of-line character is. The "common sense" newline is a carriage return (CR) followed by a linefeed (LF); but in the earliest days of Unix, the decision was made to store only the linefeed, thus saving an entire byte per line of text. (This was when 512KB was a lot of memory.) Today we might expect that a text file is a text file, but in moving files between, say, a Unix machine and a Windows machine, we are bitten over and over by the newline problem.

Therefore, we offer a little solution here. The gets method will honor either kind of newline, and the puts method will always write in the native format. That means this same code will convert to native format on either kind of operating system. We show it here as a simple filter that reads standard input and writes to standard output:

 

 while line = gets   puts line end 

Another case might result from receiving an entire file in an unknown OS format. Whereas Unix variants use LF for their newline, and Windows versions use CR-LF, we have yet another possibility with Mac OS, which uses just CR. This very situation arises on the Web when a TEXTAREA is processed, among other times.

Here is one way to handle this situation, in which we wish to save the contents of a text area into a file on our Linux Web server:

 

 tmp=cgi.params["mytextarea"].to_s File.open("newfile","w") do |f|   newstring = tmp.gsub!(/\r\n/m,"\n") or               tmp.gsub!(/\r/m,"\n") or tmp   newstring.each {  |line| f.puts line } end 

The first gsub! looks for the CR-LF pair from a PC. If it finds none, it returns nil, meaning the or will allow the next gsub! to execute, which works on Mac OSbased files. If no carriage returns are found, the originally captured string is used. After conversion, the string is written line by line to a file.

Working with Temporary Files

In many circumstances we need to work with files that are all but anonymous. We don't want the trouble of naming them or making sure there is no name conflict, and we don't want to bother with deleting them.

All these issues are addressed in the Tempfile library. The new method (alias open) will take a base name as a "seed string" and will concatenate onto it the process ID and a unique sequence number. The optional second parameter is the directory to be used; it defaults to the value of environment variable TMPDIR, TMP, or TEMP, and finally the value "/tmp".

The resulting IO object may be opened and closed many times during the execution of the program. Upon termination of the program, the temporary file will be deleted.

The close method has an optional flag; if set to true, the file will be deleted immediately after it is closed (instead of waiting until program termination). The path method will return the actual pathname of the file, should you need it. Here's an example:

 

 require "tempfile" temp = Tempfile.new("stuff") name = temp.path              # "/tmp/stuff17060.0" temp.puts "Kilroy was here" temp.close # Later... temp.open str = temp.gets               # "Kilroy was here" temp.close(true)              # Delete it NOW 

Changing and Setting the Current Directory

The current directory may be determined by the use of Dir.pwd or its alias Dir.getwd; these abbreviations historically stand for print working directory and get working directory, respectively. In a Windows environment, the backslashes will probably show up as normal (forward) slashes.

The method Dir.chdir may be used to change the current directory. On Windows, the logged drive may appear at the front of the string. Here's an example:

 

 Dir.chdir("/var/tmp") puts Dir.pwd           # "/var/tmp" puts Dir.getwd         # "/var/tmp" 

Changing the Current Root

On most Unix variants, it is possible to change the current process's idea of where the root or "slash" is. This is typically done for security reasons (for example, when running unsafe or untested code). The chroot method will set the new root to the specified directory:

 

 Dir.chdir("/home/guy/sandbox/tmp") Dir.chroot("/home/guy/sandbox") puts Dir.pwd                     # "/tmp" 

Iterating over Directory Entries

The class method foreach is an iterator that will successively pass each directory entry into the block. The instance method each behaves the same way. Here's an example:

 

 Dir.foreach("/tmp") {  |entry| puts entry } dir = Dir.new("/tmp") dir.each  {  |entry| puts entry } 

Both of the preceding code fragments will print the same output (the names of all files and subdirectories in /tmp).

Getting a List of Directory Entries

The class method Dir.entries will return an array of all the entries in the specified directory:

 

 list = Dir.entries("/tmp")  # %w[. .. alpha.txt beta.doc] 

Creating a Chain of Directories

Sometimes we want to create a chain of directories where the intermediate directories themselves don't necessarily exist yet. At the Unix command line, we would use mkdir -p for this.

In Ruby code, we can do this by using the makedirs method, which the ftools library adds to File:

 

 require "ftools" File.makedirs("/tmp/these/dirs/need/not/exist") 

Deleting a Directory Recursively

In the Unix world, we can type rm -rf dir at the command line and the entire subtree starting with dir will be deleted. Obviously, we should exercise caution in doing this.

If you need a piece of code to accomplish this, here it is:

 

 def delete_all(dir)   Dir.foreach(dir) do |e|     # Don't bother with . and ..     next if [".",".."].include? e     fullname = dir + File::Separator + e     if FileTest::directory?(fullname)       delete_all(fullname)     else       File.delete(fullname)     end   end   Dir.delete(dir) end delete_all("dir1")   # Remove dir1 and everything under it! 

Finding Files and Directories

Here, we make use of the standard library find.rb to create a method that will find one or more files and return the list of files as an array. The first parameter is the starting directory; the second is either a filename (that is, a string) or a regular expression:

 

 require "find" def findfiles(dir, name)   list = []   Find.find(dir) do |path|     Find.prune if [".",".."].include? path     case name       when String         list << path if File.basename(path) == name       when Regexp         list << path if File.basename(path) =~ name     else       raise ArgumentError     end   end   list end findfiles "/home/hal", "toc.txt" # ["/home/hal/docs/toc.txt", "/home/hal/misc/toc.txt"] findfiles "/home", /^[a-z]+.doc/ # ["/home/hal/docs/alpha.doc", "/home/guy/guide.doc", #  "/home/bill/help/readme.doc"] 


   

 

 



The Ruby Way
The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)
ISBN: 0672328844
EAN: 2147483647
Year: 2000
Pages: 119
Authors: Hal Fulton

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net