Deleting Files That Match a Regular Expression

Credit: Matthew Palmer

Problem

You have a directory full of files and you need to remove some of them. The patterns you want to match are too complex to represent as file globs, but you can represent them as a regular expression.

Solution

The Dir.entries method gives you an array of all files in a directory, and you can iterate over this array with #each. A method to delete the files matching a regular expression might look like this:

	def delete_matching_regexp(dir, regex)
	 Dir.entries(dir).each do |name|
	 path = File.join(dir, name)
	 if name =~ regex
	 ftype = File.directory?(path) ? Dir : File
	 begin
	 ftype.delete(path)
	 rescue SystemCallError => e
	 $stderr.puts e.message
	 end
	 end
	 end
	end

Heres an example. Lets create a bunch of files and directories beneath a temporary directory:


	require  
fileutils
	tmp_dir = 	mp_buncha_files
	files = [A, A.txt, A.html, p.html, A.html.bak]
	directories = [	ext.dir, Directory.for.html]

	Dir.mkdir(tmp_dir) unless File.directory? tmp_dir
	files.each { |f| FileUtils.touch(File.join(tmp_dir,f)) }
	directories.each { |d| Dir.mkdir(File.join(tmp_dir, d)) }

Now lets delete some of those files and directories. Well delete a file or directory if its name starts with a capital letter, and if its extension (the string after its last period) is at least four characters long. This corresponds to the regular expression /^[A-Z].*.[^.]{4,}$/:

	Dir.entries(tmp_dir)
	# => [".", "..", "A", "A.txt", "A.html", "p.html", "A.html.bak",
	# "text.dir", "Directory.for.html"]

	delete_matching_regexp(tmp_dir, /^[A-Z].*.[^.]{4,}$/)

	Dir.entries(tmp_dir)
	# => [".", "..", "A", "A.txt", "p.html", "A.html.bak", "text.dir"]

Discussion

Like most good things in Ruby, Dir.entries takes a code block. It yields every file and subdirectory it finds to that code block. Our particular code block uses the regular expression match operator =~ to match every real file (no subdirectories) against the regular expression, and File.delete to remove offending files.

File.delete won delete directories; for that, you need Directory.delete. So delete_ matching_regexp uses the File predicates to check whether a file is a directory. We also have error reporting, to report cases when we don have permission to delete a file, or a directory isn empty.

Of course, once weve got this basic "find matching files" thing going, theres no reason why we have to limit ourselves to deleting the matched files. We can move them to somewhere new:

	def move_matching_regexp(src, dest, regex)
	 Dir.entries(dir).each do |name|
	 File.rename(File.join(src, name), File.join(dest, name)) if name =~ regex
	 end
	end

Or we can append a suffix to them:

	def append_matching_regexp(dir, suffix, regex)
	 Dir.entries(dir).each do |name|
	 if name =~ regex
	 File.rename(File.join(dir, name), File.join(dir, name+suffix))
	 end
	 end
	end

Note the common code in both of those implementations. We can factor it out into yet another method that takes a block:

	def each_matching_regexp(dir, regex)
	 Dir.entries(dir).each { |name| yield name if name =~ regex }
	end

We no longer have to tell Dir.each how to match the files we want; we just need to tell each_matching_regexp what to do with them:

	def append_matching_regexp(dir, suffix, regex)
	 each_matching_regexp(dir, regex) do |name|
	 
File.rename(File.join(dir, name), File.join(dir, name+suffix))
	 end
	end

This is all well and good, but these methods only manipulate files directly beneath the directory you specify. "Ive got a whole tree full of files I want to get rid of!" I hear you cry. For that, you should use Find.find instead of Dir.each. Apart from that change, the implementation is nearly identical to delete_matching_regexp:

	def delete_matching_regexp_recursively(dir, regex)
	 Find.find(dir) do |path|
	 dir, name = File.split(path)
	 if name =~ regex
	 ftype = File.directory?(path) ? Dir : File
	 begin
	 ftype.delete(path)
	 rescue SystemCallError => e
	 $stderr.puts e.message
	 end
	 end
	 end
	end

If you want to recursively delete the contents of directories that match the regular expression (even if the contents themselves don match), use FileUtils.rm_rf instead of Dir.delete.

See Also

  • Dir.delete will only remove an empty directory; see Recipe 6.18 for information on how to remove one thats not empty
  • Recipe 6.20, "Finding the Files You Want"


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net