Section 18.3. Conclusion | The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)

18.2. Network Clients

Sometimes the server is a well-known entity or is using a well-established protocol. In this case, we need simply to design a client that will talk to this server in the way it expects.

This can be done with TCP or UDP, as we saw in section 18.1. But it is common to use other higher-level protocols such as HTTP or SNMP. We'll look at a few examples here.

18.2.1. Retrieving Truly Random Numbers from the Web

Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.

John von Neumann

There is a rand function in Kernel to return a random number; but there is a fundamental problem with it. It isn't really random. If you are a mathematician, cryptographer, or other nitpicker, you will refer to this as a pseudorandom number generator because it uses algebraic methods to generate numbers in a deterministic fashion. These numbers "look" random to the casual observer, and may even have the correct statistical properties, but the sequences do repeat eventually, and we can even repeat a sequence purposely (or accidentally) by using the same seed.

But processes in nature are considered to be truly random. That is why in state lotteries, winners of millions of dollars are picked based on the chaotic motions of little white balls. Other sources of randomness are radioactive emissions or atmospheric noise.

There are sources of random numbers on the Web. One of these is www.random.org, which we use in this example.

The sample code in Listing 18.4 simulates the throwing of five ordinary (six-sided) dice. Of course, gaming fans could extend it to 10-sided or 20-sided, but the ASCII art would get tedious.

Listing 18.4. Casting Dice at Random

require 'net/http' HOST = "www.random.org" RAND_URL = "/cgi-bin/randnum?col=5&" def get_random_numbers(count=1, min=0, max=99)   path = RAND_URL + "num=#{count}&min=#{min}&max=#{max}"   connection = Net::HTTP.new(HOST)   response, data = connection.get(path)   if response.code == "200"     data.split.collect { |num| num.to_i }   else     []   end end DICE_LINES = [   "+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ ",   "|     | |  *  | | *   | | * * | | * * | | * * | ",   "|  *  | |     | |  *  | |     | |  *  | | * * | ",   "|     | |  *  | |   * | | * * | | * * | | * * | ",   "+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ " ] DIE_WIDTH = DICE_LINES[0].length/6 def draw_dice(values)   DICE_LINES.each do |line|     for v in values       print line[(v-1)*DIE_WIDTH, DIE_WIDTH]       print " "     end     puts   end end draw_dice(get_random_numbers(5, 1, 6))

In the previous code, we're using the Net::HTTP class to communicate directly with a web server. Think of it as a highly special-purpose web browser. We form the URL and try to connect; when we make a connection, we get a response and a piece of data; if the response indicates that all is well, we can parse the data that we got back. Exceptions are assumed to be handled by the caller.

Let's look at a variation on the same basic idea. What if we really wanted to use these random numbers in an application? Because the CGI at the server end allows us to specify how many numbers we want returned, it's logical to buffer them. It's a fact of life that a delay is usually involved when accessing a remote site. We want to fill a buffer so that we are not making frequent web accesses and incurring delays.

In Listing 18.5, we implement this variation. The buffer is filled by a separate thread, and it is shared among all the instances of the class. The buffer size and the "low water mark" (@slack) are both tunable; appropriate real-world values for them would be dependent on the reachability (ping-time) of the server and on how often the application requested a random number from the buffer.

Listing 18.5. A Buffered Random Number Generator

require "net/http" require "thread" class TrueRandom   def initialize(min=nil,max=nil,buff=nil,slack=nil)     @buffer = []     @site = "www.random.org"     if ! defined? @init_flag       # Set real defaults if not specified AND the class       #   is being instantiated for the first time...       @min = min || 0       @max = max || 1       @bufsize = buff || 1000       @slacksize = slack || 300       @mutex = Mutex.new       @thread = Thread.new { fillbuffer }       @init_flag = TRUE  # Could really be any value     else       @min = min || @min       @max = max || @max       @bufsize = buff || @bufsize       @slacksize = slack || @slacksize     end     @url  = "/cgi-bin/randnum" +             "?num=#@bufsize&min=#@min&max=#@max&col=1"   end   def fillbuffer     h = Net::HTTP.new(@site, 80)     resp, data = h.get(@url, nil)     @buffer += data.split   end   def rand     num = nil     @mutex.synchronize { num = @buffer.shift }     if @buffer.size < @slacksize       if ! @thread.alive?         @thread = Thread.new { fillbuffer }       end     end     if num == nil       if @thread.alive?         @thread.join       else         @thread = Thread.new { fillbuffer }         @thread.join       end       @mutex.synchronize { num = @buffer.shift }     end     num.to_i   end end t = TrueRandom.new(1,6,1000,300) count = {1=>0, 2=>0, 3=>0, 4=>0, 5=>0, 6=>0} 10000.times do |n|   x = t.rand   count[x] += 1 end p count # In one run: # {4=>1692, 5=>1677, 1=>1678, 6=>1635, 2=>1626, 3=>1692}

18.2.2. Contacting an Official Timeserver

As we promised, here's a bit of code to contact an NTP (Network Time Protocol) server on the Net. We do this by means of a telnet client. The following code is adapted from a piece of code by Dave Thomas.

require "net/telnet" timeserver = "www.fakedomain.org" local = Time.now.strftime("%H:%M:%S") tn = Net::Telnet.new("Host"       => timeserver,                      "Port"       => "time",                      "Timeout"    => 60,                      "Telnetmode" => false) msg = tn.recv(4).unpack('N')[0] # Convert to epoch remote = Time.at(msg - 2208988800).strftime("%H:%M:%S") puts "Local : #{local}" puts "Remote: #{remote}"

We establish a connection and grab four bytes. These represent a 32-bit quantity in network byte order (big endian); we convert this number to something we can digest and then convert from the epoch to a Time object.

Note that we didn't use a real timeserver name. This is because the usefulness of such a server frequently depends on your geographic location. Furthermore, many of these have access restrictions and may require permission or at least notification before they are used. A web search should turn up an open-access NTP server less than 1,000 miles from you.

18.2.3. Interacting with a POP Server

The Post Office Protocol (POP) is commonly used by mail servers. Ruby's POP3 class enables you to examine the headers and bodies of all messages waiting on a server and process them as you see fit. After processing, you can easily delete one or all of them.

The Net::POP3 class must be instantiated with the name or IP address of the server; the port number defaults to 110. No connection is established until the method start is invoked (with the appropriate username and password).

Invoking the method mails on this object will return an array of objects of class POPMail. (There is also an iterator each that will run through these one at a time.)

A POPMail object corresponds to a single email message. The header method will retrieve the message's headers; the method all will retrieve the header and the body. (There are also other usages of all as we'll see shortly.)

A code fragment is worth a thousand words. Here's a little example that will log on to the server and print the subject line for each email:

require "net/pop" pop = Net::POP3.new("pop.fakedomain.org") pop.start("gandalf", "mellon")     # user, password pop.mails.each do |msg|   puts msg.header.grep /^Subject: / end

The delete method will delete a message from the server. (Some servers require that finish be called to close the POP connection before such an operation becomes final.) Here is the world's most trivial spam filter:

require "net/pop" pop = Net::POP3.new("pop.fakedomain.org") pop.start("gandalf", "mellon")     # user, password pop.mails.each do |msg|   if msg.all =~ /.*make money fast.*/     msg.delete   end end pop.finish

We'll mention that start can be called with a block. By analogy with File.open, it opens the connection, executes the block, and closes the connection.

The all method can also be called with a block. This will simply iterate over the lines in the email message; it is equivalent to calling each on the string resulting from all.

# Print each line backwards... how useful! msg.all { |line| print line.reverse } # Same thing... msg.all.each { |line| print line.reverse }

We can also pass an object into the all method. In this case, it will call the append operator (<<) repeatedly for each line in the string. Because this operator is defined differently for different objects, the behavior may be radically different, as shown here:

arr = []          # Empty array str = "Mail: "    # String out = $stdout     # IO object msg.all(arr)      # Build an array of lines msg.all(str)      # Concatenate onto str msg.all(out)      # Write to standard output

Finally, we'll give you a way to return only the body of the message, ignoring all headers.

module Net   class POPMail     def body       # Skip header bytes       self.all[self.header.size..-1]     end   end end

This doesn't have all the properties that all has, but it could be extended. We'll leave that to you.

For those who prefer IMAP to POP3, see section 18.2.5, "Interacting with an IMAP Server."

18.2.4. Sending Mail with SMTP

A child of five could understand this. Fetch me a child of five.

Groucho Marx

The Simple Mail Transfer Protocol (SMTP) may seem like a misnomer. If it is "simple," it is only by comparison with more complex protocols.

Of course, the smtp.rb library shields the programmer from most of the details of the protocol. However, we have found that the design of this library is not entirely intuitive and perhaps overly complex (and we hope it will change in the future). In this section, we try to present a few examples to you in easily digested pieces.

The Net::SMTP class has two class methods, new and start. The new method takes two parametersthe name of the server (defaulting to localhost) and the port number (defaulting to the well-known port 25).

The start method takes these parameters:

server is the IP name of the SMTP server, defaulting to "localhost".
port is the port number, defaulting to 25.
domain is the domain of the mail sender, defaulting to ENV["HOSTNAME"].
account is the username, default is nil.
password is the user password, defaulting to nil.
authtype is the authorization type, defaulting to :cram_md5.

Many or most of these parameters may be omitted under normal circumstances.

If start is called "normally" (without a block), it returns an object of class SMTP. If it is called with a block, that object is passed into the block as a convenience.

An SMTP object has an instance method called sendmail, which will typically be used to do the work of mailing a message. It takes three parameters:

source is a string or array (or anything with an each iterator returning one string at a time).
sender is a string that will appear in the "from" field of the email.
recipients is a string or an array of strings representing the addressee(s).

Here is an example of using the class methods to send an email:

require 'net/smtp' msg = <<EOF Subject: Many things "The time has come," the Walrus said, "To talk of many things -- Of shoes, and ships, and sealing wax, Of cabbages and kings; And why the sea is boiling hot, And whether pigs have wings." EOF Net::SMTP.start("smtp-server.fake.com") do |smtp|   smtp.sendmail msg, 'walrus@fake1.com', 'alice@fake2.com' end

Because the string Subject: was specified at the beginning of the string, Many things will appear as the subject line when the message is received.

There is also an instance method named start, which behaves much the same as the class method. Because new specifies the server, start doesn't have to specify it. This parameter is omitted, and the others are the same as for the class method. This gives us a similar example using an SMTP object.

require 'net/smtp' msg = <<EOF Subject: Logic "Contrariwise," continued Tweedledee, "if it was so, it might be, and if it were so, it would be; but as it isn't, it ain't. That's logic." EOF smtp = Net::SMTP.new("smtp-server.fake.com") smtp.start smtp.sendmail msg, 'tweedledee@fake1.com', 'alice@fake2.com'

In case you are not confused yet, the instance method can also take a block.

require 'net/smtp' msg = <<EOF Subject: Moby-Dick Call me Ishmael. EOF addressees = ['reader1@fake2.com', 'reader2@fake3.com'] smtp = Net::SMTP.new("smtp-server.fake.com") smtp.start do |obj|   obj.sendmail msg, 'narrator@fake1.com', addressees end

As the example shows, the object passed into the block (obj) certainly need not be named the same as the receiver (smtp). I'll also take this opportunity to emphasize that the recipient can be an array of strings.

There is also an oddly named instance method called ready. This is much the same as sendmail, with some crucial differences. Only the sender and recipients are specified; the body of the message is constructed using an adapteran object of class Net::NetPrivate::WriteAdapter, which has a write method as well as an append method. This adapter is passed into the block and can be written to in an arbitrary way.

require "net/smtp" smtp = Net::SMTP.new("smtp-server.fake1.com") smtp.start smtp.ready("t.s.eliot@fake1.com", "reader@fake2.com") do |obj|   obj.write "Let us go then, you and I,\r\n"   obj.write "When the evening is spread out against the sky\r\n"   obj.write "Like a patient etherised upon a table...\r\n" end

Note here that the carriage-return linefeed pairs are necessary (if we actually want line breaks in the message). Those who are familiar with the actual details of the protocol should note that the message is "finalized" (with "dot" and "QUIT") without any action on our part.

We can append instead of calling write if we want:

smtp.ready("t.s.eliot@fake1.com", "reader@fake2.com") do |obj|   obj << "In the room the women come and go\r\n"   obj << "Talking of Michelangelo.\r\n" end

Finally, we offer a minor improvement; we add a puts method that will tack on the newline for us.

class Net::NetPrivate::WriteAdapter   def puts(args)     args << "\r\n"     self.write(*args)   end end

This new method enables us to write this way:

smtp.ready("t.s.eliot@fake1.com", "reader@fake2.com") do |obj|   obj.puts "We have lingered in the chambers of the sea"   obj.puts "By sea-girls wreathed with seaweed red and brown"   obj.puts "Till human voices wake us, and we drown." end

If your needs are more specific than what we've detailed here, we suggest you do your own experimentation. And if you decide to write a new interface for SMTP, please feel free.

18.2.5. Interacting with an IMAP Server

The IMAP protocol is not the prettiest in the world, but it is superior to POP3 in many ways. Messages may be stored on the server indefinitely (individually marked as read or unread). Messages may be stored in hierarchical folders. These two facts alone are enough to establish IMAP as more powerful than POP3.

The standard library net/imap enables us to interact with an IMAP server. As you would expect, you connect to the server and then log in to an account with a username and password, as shown in the following code:

require 'net/imap' host = "imap.hogwarts.edu" user, pass = "lupin", "riddikulus" imap = Net::IMAP.new(host) begin   imap.login(user, pass)   # Or alternately:   # imap.authenticate("LOGIN", user, pass) rescue Net::IMAP::NoResponseError   abort "Could not login as #{user}" end # Process as needed... imap.logout   # break the connection

After you have a connection, you can do an examine on a mailbox; the default mailbox in IMAP is called INBOX. The responses method retrieves information about the mailbox, returning a hash of arrays (with the interesting data in the last element of each array). The following code finds the total number of messages in the mailbox ("EXISTS") and the number of unread messages ("RECENT"):

imap.examine("INBOX") total = imap.responses["EXISTS"].last    # total messages recent = imap.responses["RECENT"].last   # unread messages imap.close                               # close the mailbox

Note that examine gives you read-only access to the mailbox. If, for example, you want to delete messages or make other changes, you should use select instead.

IMAP mailboxes are hierarchical and look similar to UNIX pathnames. You can use the create, delete, and rename methods to manipulate mailboxes:

imap.create("lists") imap.create("lists/ruby") imap.create("lists/rails") imap.create("lists/foobar") # Oops, kill that last one: imap.delete("lists/foobar")

There are also methods named list (to list all the available mailboxes) and lsub (to list all the "active" or "subscribed" mailboxes). The status method will return information about the mailbox.

The search method will find messages according to specified criteria, and fetch will fetch a given message. Here is an example:

msgs = imap.search("TO","lupin") msgs.each do |mid|   env = imap.fetch(mid, "ENVELOPE")[0].attr["ENVELOPE"]   puts "From #{env.from[0].name}     #{env.subject}" end

The fetch command in the preceding code appears convoluted because it returns an array of hashes. The envelope itself is similarly complex; some of its accessors are arrays of complex objects, and some are simply strings.

IMAP has the concept of UID (unique IDs) and sequence numbers for messages. Typically, methods such as fetch deal with sequence numbers and have counterparts such as uid_fetch that deal with UIDs. There is no room here to explain why both numbering systems are appropriate; if you are doing any significant programming with IMAP, however, you will need to know the difference (and never get them mixed up).

The net/imap library has extensive support for handling mailboxes, messages, attachments, and so on. For more details, refer to the online documentation at ruby-doc.org.

18.2.6. Encoding/Decoding Attachments

Files are usually attached to email or news messages in a special encoded form. More often than not, the encoding is base64, which can be encoded or decoded with the pack directive m:

bin = File.read("new.gif") str = [bin].pack("m")        # str is now encoded orig = str.unpack("m")[0]    # orig == bin

Older mail clients may prefer to work with uuencode and uudecode; in a case like this, an attachment is more a state of mind than anything else. The attachment is simply appended to the end of the email text, bracketed inside begin and end lines, with the begin line also specifying file permissions (which may be ignored) and filename. The pack directive u serves to encode a uuencoded string. The following code shows an example:

# Assume mailtext holds the text of the email filename = "new.gif" bin = File.read(filename) encoded = [bin].pack("u") mailtext << "begin 644 #{filename}" mailtext << encoded mailtext << "end" # ...

On the receiving end, we would extract the encoded information and use unpack to decode it:

# ... # Assume 'attached' has the encoded data (including the # begin and end lines) lines = attached.split("\n") filename = /begin \d\d\d (.*)/.scan(lines[0]).first.first encoded = lines[1..-2].join("\n") decoded = encoded.unpack("u")      # Ready to write to filename

More modern mail readers usually use MIME format for email; even the text part of the email is wrapped (although the client strips all the header information before the user sees it).

A complete treatment of MIME would be lengthy and off-topic here. However, the following code shows a simple example of encoding and sending an email with a text portion and a binary attachment. The encoding for binaries is usually base64 as shown here.

require 'net/smtp' def text_plus_attachment(subject,body,filename)   marker = "MIME_boundary"   middle = "--#{marker}\n"   ending = "--#{middle}--\n"   content = "Content-Type: Multipart/Related; " +             "boundary=#{marker}; " +             "typw=text/plain"   head1 = <<-EOF MIME-Version: 1.0 #{content} Subject: #{subject}   EOF   binary = File.read(filename)   encoded = [binary].pack("m")   # base64   head2 = <<EOF Content-Description: "#{filename}" Content-Type: image/gif; name="#{filename}" Content-Transfer-Encoding: Base64 Content-Disposition: attachment; filename="#{filename}" EOF   # Return...   head1 + middle + body + middle + head2 + encoded + ending end domain  = "someserver.com" smtp    = "smtp.#{domain}" user, pass = "elgar","enigma" body = <<EOF This is my email. There isn't much to say. I attached a very small GIF file here.         -- Bob EOF mailtext = text_plus_attachment("Hi, there...",body,"new.gif") Net::SMTP.start(smtp, 25, domain, user, pass, :plain) do |mailer|   mailer.sendmail(mailtext, 'fromthisguy@wherever.com',                   ['destination@elsewhere.com']) end

18.2.7. Case Study: A Mail-News Gateway

Online communities keep in touch with each other in many ways. Two of the most traditional of these are mailing lists and newsgroups.

Not everyone wants to be on a mailing list that may generate dozens of messages per day; some would rather read a newsgroup and pick through the information at random intervals. On the other hand, some people are impatient with Usenet and want to get the messages before the electrons have time to cool off.

So we get situations in which a fairly small, private mailing list deals with the same subject matter as an unmoderated newsgroup open to the whole world. Eventually someone gets the idea for a mirrora gateway between the two.

Such a gateway isn't appropriate in every situation, but in the case of the Ruby mailing list, it was and is. The newsgroup messages needed to be copied to the list, and the list emails needed to be posted on the newsgroup.

This need was addressed by Dave Thomas (in Ruby, of course), and we present the code with his kind permission in Listings 18.6 and 18.7.

But let's look at a little background first. We've taken a quick look at how email is sent and received, but how do we handle Usenet? As it turns out, we can access the newsgroups via a protocol called NNTP (Network News Transfer Protocol). This creation, incidentally, was the work of Larry Wall, who later on gave us Perl.

Ruby doesn't have a "standard" library to handle NNTP. However, a Japanese developer (known to us only as greentea) has written a nice library for this purpose.

The nntp.rb library defines a module NNTP containing a class called NNTPIO; it has instance methods connect, get_head, get_body, and post (among others). To retrieve messages, you connect to the server and call get_head and get_body, repetitively. (We're oversimplifying this.) Likewise, to post a message, you basically construct the headers, connect to the server, and call the post method.

These programs use the smtp library, which we've looked at previously. The original code also does some logging to track progress and record errors; we've removed this logging for greater simplicity.

The file params.rb is used by both programs. This file contains the parameters that drive the whole mirroring processthe names of the servers, account names, and so on. The following is a sample file that you will need to reconfigure for your own purposes. (The domain names used in the code, which all contain the word fake, are obviously intended to be fictitious.)

# These are various parameters used by the mail-news gateway module Params   NEWS_SERVER = "usenet.fake1.org"       # name of the news server   NEWSGROUP   = "comp.lang.ruby"         # mirrored newsgroup   LOOP_FLAG   = "X-rubymirror: yes"      #  avoid loops   LAST_NEWS_FILE = "/tmp/m2n/last_news"  # last msg num read   SMTP_SERVER = "localhost"              # host for outgoing mail   MAIL_SENDER = "myself@fake2.org"       # Name used to send mail   # (On a subscription-based list, this   # name must be a list member.)   MAILING_LIST = "list@fake3.org"        # Mailing list address end

The module Params merely contains constants that are accessed by the two programs. Most are self-explanatory; we'll only point out a couple of items here. First, the LAST_NEWS_FILE constant identifies a file where the most recent newsgroup message ID is stored; this is "state information," so that work is not duplicated or lost.

Perhaps even more important, the LOOP_FLAG constant defines a string that marks a message as having already passed through the gateway. This avoids infinite recursion and prevents the programmer from being mobbed by hordes of angry netizens who have received thousands of copies of the same message.

You might be wondering: How do we actually get the mail into the mail2news program? After all, it appears to read standard input. Well, the author recommends a setup like this: The sendmail program's .forward file first forwards all incoming mail to procmail. The .procmail file is set up to scan for messages from the mailing list and pipe them into the mail2news program. For the exact details of this, see the documentation associated with RubyMirror (found in the Ruby Application Archive). Of course, if you are on a non-UNIX system, you will likely have to come up with your own scheme for handling this situation.

Aside from what we've already said, we'll let the code stand on its own, as shown in Listings 18.6 and 18.7.

Listing 18.6. Mail-to-News

# mail2news: Take a mail message and post it # as a news article require "nntp" include NNTP require "params" # Read in the message, splitting it into a # heading and a body. Only allow certain # headers through in the heading HEADERS = %w{From Subject References Message-ID              Content-Type Content-Transfer-Encoding Date} allowed_headers = Regexp.new(%{^(#{HEADERS.join("|")}):}) # Read in the header. Only allow certain # ones. Add a newsgroups line and an # X-rubymirror line. head = "Newsgroups: #{Params::NEWSGROUP}\n" subject = "unknown" while line = gets   exit if line =~ /^#{Params::LOOP_FLAG}/o # shouldn't happen   break if line =~ /^\s*$/   next if line =~ /^\s/   next unless line =~ allowed_headers   # strip off the [ruby-talk:nnnn] prefix on the subject before   # posting back to the news group   if line =~ /^Subject:\s*(.*)/     subject = $1     # The following strips off the special ruby-talk number     # from the front of mailing list messages before     # forwarding them on to the news server.     line.sub!(/\[ruby-talk:(\d+)\]\s*/, '')     subject = "[#$1] #{line}"     head << "X-ruby-talk: #$1\n"   end   head << line end head << "#{Params::LOOP_FLAG}\n" body = "" while line = gets   body << line end msg = head + "\n" + body msg.gsub!(/\r?\n/, "\r\n") nntp = NNTPIO.new(Params::NEWS_SERVER) raise "Failed to connect" unless nntp.connect nntp.post(msg)

Listing 18.7. News-to-Mail

## # Simple script to help mirror the comp.lang.ruby # traffic on to the ruby-talk mailing list. # # We are called periodically (say once every 20 minutes). # We look on the news server for any articles that have a # higher message ID than the last message we'd sent # previously. If we find any, we read those articles, # send them on to the mailing list, and record the # new hightest message id. require 'nntp' require 'net/smtp' require 'params' include NNTP ## # Send mail to the mailing-list. The mail must be # from a list participant, although the From: line # can contain any valid address # def send_mail(head, body)   smtp = Net::SMTP.new   smtp.start(Params::SMTP_SERVER)   smtp.ready(Params::MAIL_SENDER, Params::MAILING_LIST) do |a|     a.write head     a.write "#{Params::LOOP_FLAG}\r\n"     a.write "\r\n"     a.write body   end end ## # We store the message ID of the last news we received. begin   last_news = File.open(Params::LAST_NEWS_FILE) {|f| f.read} .to_i rescue   last_news = nil end ## # Connect to the news server, and get the current # message numbers for the comp.lang.ruby group # nntp = NNTPIO.new(Params::NEWS_SERVER) raise "Failed to connect" unless nntp.connect count, first, last = nntp.set_group(Params::NEWSGROUP) ## # If we didn't previously have a number for the highest # message number, we do now if not last_news   last_news = last end ## # Go to the last one read last time, and then try to get more. # This may raise an exception if the number is for a # nonexistent article, but we don't care. begin   nntp.set_stat(last_news) rescue end ## # Finally read articles until there aren't any more, # sending each to the mailing list. new_last = last_news begin   loop do     nntp.set_next     head = ""     body = ""     new_last, = nntp.get_head do |line|       head << line     end     # Don't sent on articles that the mail2news program has     # previously forwarded to the newsgroup (or we'd loop)     next if head =~ %r{^X-rubymirror:}     nntp.get_body do |line|       body << line     end     send_mail(head, body)   end rescue end ## # And record the new high water mark File.open(Params::LAST_NEWS_FILE, "w") do |f|   f.puts  new_last end unless new_last == last_news

18.2.8. Retrieving a Web Page from a URL

Suppose that, for whatever reason, we want to retrieve an HTML document from where it lives on the Web. Maybe our intent is to do a checksum and find whether it has changed so that our software can inform us of this automatically. Maybe our intent is to write our own web browser; this would be the proverbial first step on a journey of a thousand miles.

Here's the code:

require "net/http" begin   h = Net::HTTP.new("www.marsdrive.com", 80)    # MarsDrive Consortium   resp, data = h.get("/index.html", nil) rescue => err   puts "Error: #{err}"   exit end puts "Retrieved #{data.split.size} lines, #{data.size} bytes" # Process as desired...

We begin by instantiating an HTTP object with the appropriate domain name and port. (The port, of course, is usually 80.) We then do a get operation, which returns an HTTP response and a string full of data. Here we don't actually test the response, but if there is any kind of error, we'll catch it and exit.

If we skip the rescue clause as we normally would, we can expect to have an entire web page stored in the data string. We can then process it however we want.

What could go wrong herewhat kind of errors do we catch? Actually, there are several. The domain name could be nonexistent or unreachable; there could be a redirect to another page (which we don't handle here); or we might get the dreaded 404 error (meaning that the document was not found). We'll leave this kind of error handling to you.

The next section (18.2.9 "Using the Open-URI Library") will also be useful to you. It shows a slightly simpler way of handling this kind of task.

18.2.9. Using the Open-URI Library

The Open-URI library is the work of Tanaka Akira. Its purpose is to "unify" the programmatic treatment of Net resources so that they are all intuitive and easy to handle.

This code is essentially a wrapper for the net/http, net/https, and net/ftp libraries, making available an open method that will handle an arbitrary URI. The example from the preceding section can be written this way:

require 'open-uri' data = nil open("http://www.marsdrive.com/") {|f| data = f.read } puts "Retrieved #{data.split.size} lines, #{data.size} bytes"

The file returned by open (f in the previous case) is not just a file. This object also has the methods of the OpenURI::Meta module so that we can access metadata:

# ... uri = f.base_uri             # a URI object with its own readers ct  = f.content_type         # "text/html" cs  = f.charset              # "utf-8" ce  = f.content_encoding     # []

The library allows the specifying of additional header fields by using a hash with the open command. It also handles proxy servers and has several other useful features. There are cases where this library may be insufficient; for example, you may need to parse HTTP headers, buffer extremely large downloads, send cookies, and other such tasks. For more information on this library, see the online documentation at http://ruby-doc.org.