You want to display or process a specific web page.
The simplest solution is to use the open-uri library. It lets you open a web page as though it were a file. This code fetches the oreilly.com homepage and prints out the first part of it:
require open-uri puts open(http://www.oreilly.com/).read(200) # #
For more complex applications, youll need to use the net/http library. Use Net::HTTP.get_response to make an HTTP request and get the response as a Net::HTTPResponse object containing the response code, headers, and body.
require net/http response = Net::HTTP.get_response(www.oreilly.com, /about/) response.code # => "200" response.body.size # => 21835 response[Content-type] # => "text/html; charset=ISO-8859-1" puts response.body[0,200] # # # # # #
Discussion
If you just want the text of the page, use get. If you also want the response code or the values of the HTTP response headers, use get_reponse.
The get_response method returns some HTTPResponse subclass of Net:HTTPResponse, which contains all information about an HTTP response. Theres one subclass for every response code defined in the HTTP standard; for instance, HTTPOK for the 200 response code, HTTPMovedPermanently for the 301 response code, and HTTPNotFound for the 404 response code. Theres also an HTTPUnknown subclass for any response codes not defined in HTTP.
The only difference between these subclasses is the class name and the code member. You can check the response code of an HTTP response by comparing specific classes with is_a?, or by checking the result of HTTPResponse#code, which returns a String:
puts "Success!" if response.is_a? Net::HTTPOK # Success! puts case response.code[0] # Check the first byte of the response code. when ?1 then "Status code indicates an HTTP informational response." when ?2 then "Status code indicates success." when ?3 then "Status code indicates redirection." when ?4 then "Status code indicates client error." when ?5 then "Status code indicates server error." else "Non-standard status code." end # Status code indicates success.
You can get the value of an HTTP response header by treating HTTPResponse as a hash, passing the header name into HTTPResponse#[]. The only difference from a real Hash is that the names of the headers are case-insensitive. Like a hash, HTTPResponse supports the iteration methods #each, #each_key, and #each_value:
response[Server] # => "Apache/1.3.34 (Unix) PHP/4.3.11 mod_perl/1.29" response[SERVER] # => "Apache/1.3.34 (Unix) PHP/4.3.11 mod_perl/1.29" response.each_key { |key| puts key } # x-cache # p3p # content-type # date # server # transfer-encoding
If you do a request by calling NET::HTTP.get_response with no code block, Ruby will read the body of the web page into a string, which you can fetch with the HTTPResponse::body method. If you like, you can process the body as you read it, one segment at a time, by passing a code block to HTTPResponse::read_body:
Net::HTTP.get_response(www.oreilly.com, /about/) do |response| response.read_body do |segment| puts "Received segment of #{segment.size} byte(s)!" end end # Received segment of 614 byte(s)! # Received segment of 1024 byte(s)! # Received segment of 848 byte(s)! # Received segment of 1024 byte(s)! # …
Note that you can only call read_body once per request. Also, there are no guarantees that a segment won end in the middle of an HTML tag name or some other inconvenient place, so this is best for applications where you e not handing the web page as structured data: for instance, when you e simply piping it to some other source.
See Also
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration