Running Servlets with WEBrick

Table of contents:

Credit: John-Mason Shackelford

Problem

You want to embed a server in your Ruby application. Your project is not a traditional web application, or else its too small to justify the use of a framework like Rails or Nitro.

Solution

Write a custom servlet for WEBrick, a web server implemented in Ruby and included in the standard library.^[7]

^[7] Don confuse WEBrick servlets with Java servlets. The concepts are similar, but they don implement the same API.

Configure WEBrick by creating a new HTTPServer instance and mouting servlets. The default FileHandler acts like a "normal" web server: it serves a URL-space corresponding to a directory on disk. It delegates requests for *.cgi files to the CGIHandler, renders *.rhtml files with ERb using the ERBHandler servlet, and serves other files (such as static HTML files) as they are.

This server mounts three servlets on a server running on port 8000 on your local machine. Each servlet serves documents, CGI scripts, and .rhtml templates from a different directory on disk:

	#!/usr/bin/ruby
	# simple_servlet_server.rb
	require webrick
	include WEBrick

	s = HTTPServer.new(:Port => 8000)
	# Add a mime type for *.rhtml files
	HTTPUtils::DefaultMimeTypes.store(
html, 	ext/html)

	# Required for CGI on Windows; unnecessary on Unix/Linux
	s.config.store( :CGIInterpreter, "#{HTTPServlet::CGIHandler::Ruby}")

	# Mount 
servlets
	s.mount(/, HTTPServlet::FileHandler, /var/www/html)
	s.mount(/bruce, HTTPServlet::FileHandler, /home/dibbbr/htdoc)
	s.mount(/marty, HTTPServlet::FileHandler, /home/wisema/htdoc)

	# Trap signals so as to shutdown cleanly.
	[TERM, INT].each do |signal|
	 trap(signal){ s.shutdown }
	end

	# Start the server and block on input.
	s.start

Discussion

WEBrick is robust, mature, and easy to extend. Beyond serving static HTML pages, WEBrick supports traditional CGI scripts, ERb-based templating like PHP or JSP, and custom servlet classes. While most of WEBricks API is oriented toward responding to HTTP requests, you can also use it to implement servers that speak another protocol. (For more on this capability, see the Daytime server example on the WEBrick home page.)

The first two arguments to HTTPServer#mount (the mount directory and servlet class) are used by the mount method itself; any additional arguments are simply passed along to the servlet. This way, you can configure a servlet while you mount it; the FileHandler servlet requires an argument telling it which directory on disk contains the web content.

When a client requests a URL, WEBrick tries to match it against the entries in its mounting table. The mounting order is irrelevant. Where multiple mount locations might apply to a single directory, WEBrick picks the longest match.

When the request is for a directory (like http://localhost/bruce/), the server looks for the files index.html, index.htm, index.cgi, or index.rhtml. This is configurable via the :DirectoryIndex configuration parameter. The snippet below adds another file to the list of directory index files:

	s.config.store(:DirectoryIndex,
	 s.config[:DirectoryIndex] << "default.htm")

When the standard handlers provided by WEBrick won work for you, write a custom servlet. Rubyists have written custom WEBrick servlets to handle SOAP and XML-RPC services, implement a WebDAV server, process eruby templates instead of ERb templates, and fork processes to distribute load on machines with multiple CPUs.

To write your own WEBrick servlet, simply subclass HTTPServlet::AbstractServlet and write do_ methods corresponding to the HTTP methods you wish to handle. Then mount your servlet class as shown in the Solution. The following example handles HTTP GET requests via the do_GET method, and POSTs via an alias. HEAD and OPTIONS requests are implemented in the AbstractServlet itself.

	#!/usr/bin/ruby
	# custom_servlet_server.rb
	require webrick
	include WEBrick

	class CustomServlet < HTTPServlet::AbstractServlet
	 def do_GET(request, response)
	 response.status = 200 # Success
	 response.body = "Hello World"
	 response[Content-Type] = 	ext/plain
	 end

	 # Respond with an HTTP POST just as we do for the HTTP GET.
	 alias :do_POST :do_GET
	end

	# Mount 
servlets.
	s = HTTPServer.new(:Port => 8001 )
	s.mount(/tricia, CustomServlet )

	# Trap signals so as to shutdown cleanly.
	[TERM, INT].each do |signal|
	 trap(signal){ s.shutdown }
	end

	# Start the server and block on input.
	s.start

Start that server, visit http://localhost:8001/tricia/, and youll see the string "Hello World".

Beyond defining handlers for arbitrary HTTP methods and configuring custom servlets with mount options, we can also control how often servlet instances are initialized. Ordinarily, a new servlet instance is instantiated for every request. Since each request has its own instance of the servlet class, you are free to write custom servlets without worrying about the servlets state and thread safety (unless, of course, you share resources between servlet instances).

But you can get faster request handlingat the expense of a slower startup timeby moving some work out of the do_ methods and into the sevlets initialize method. Instead of creating a new servlet instance with every request, you can override the class method HTTPServlet::AbstractServlet.get_instance and manage a pool of servlet instances. This works especially well when your request handling methods are reentrant, so that you can avoid cost costly thread synchronization.

The following example uses code from Recipe 12.13 to serve up a certificate of completion to the individual named by the HTTP request. We use the templating approach discussed in the PDF recipe to prepare most of the certificate ahead of time. During request handling, we do nothing but fill in the recipients name.

The PooledServlet class below does the work of pooling the servlet handlers:

	#!/usr/bin/ruby
	# certificate_server.rb
	require  
webrick
	require 	hread
	require cgi

	include WEBrick

	class PooledServlet < HTTPServlet::AbstractServlet

	 INIT_MUTEX = Mutex.new
	 SERVLET_POOL = []

	 @@pool_size = 2

	 # Create a single instance of the servlet to avoid repeating the costly
	 # initialization.
	 def self.get_instance(config, *options)
	 unless SERVLET_POOL.size == @@pool_size
	 INIT_MUTEX.synchronize do
	 SERVLET_POOL.clear
	 @@pool_size.times{ SERVLET_POOL << new( config, *options ) }
	 end
	 end
	 s = SERVLET_POOL.find{|s| ! s.busy?} while s.nil?
	 return s
	 end

	 def self.pool_size( size )
	 @@pool_size = size
	 end

	 def busy?
	 @busy
	 end

	 def service(req, res)
	 @busy = true
	 super
	 @busy = false
	 end
	end

Note that by placing the synchronize block within the unless block, we expose ourselves to the possibility that, when the server first starts up, the servlet pool may be initialized more than once. But its not really a problem if that does happen, and if we put the synchronize block there we don have to synchronize on every single request.

Youve heard it before: "Avoid premature optimization." Assumptions about the impact of the servlet pool size on memory consumption and performance often prove to be wrong, given the complexities introduced by garbage collection and the variation in the efficiency of various operations on different platforms. Code first, tune later.

Heres the application-specific code. The file certificate_pdf.rb should contain the Certificate class defined in the Discussion of Recipe 12.13.

When the servlet is initialized, we generate the PDF certificate, leaving the name blank:

	require certificate_pdf

	class PDFCertificateServlet < PooledServlet

	 pool_size 10

	 def initialize(server, *options)
	 super
	 @certificate = Certificate.new(options.first)
	 end

When the client makes a request, we load the certificate, fill in the name, and send it as the body of the HTTP response:

	def do_GET(request, response)
	 if name = request.query[
ame]
	 filled_in = @certificate.award_to(CGI.unescape(name))

	 response.body = filled_in.render
	 response.status = 200 # Success
	 response[Content-Type] = application/pdf
	 response[Size] = response.body.size
	 else
	 raise HTTPStatus::Forbidden.new("missing attribute: 
ame")
	 end
	end

The rest of the code should look familiar by now:

	 # Respond with an HTTP POST just as we do for the HTTP GET
	 alias :do_POST :do_GET
	end

	# Mount 
servlets
	s = HTTPServer.new(:Port => 8002)
	s.mount(/, PDFCertificateServlet, Ruby Hacker)

	# Trap signals so as to shutdown cleanly.
	[TERM, INT].each do |signal|
	 trap(signal){ s.shutdown }
	end
	# Start the server and block on input.
	s.start

Start this server, and you can visit http://localhost:8002/?name=My+Name to get a customized PDF certificate.

The code above illustrates many other basic features of WEBrick: access to request parameters, servlet configuration at mount time, use of a servlet pool to handle expensive operations up front, and error pages.

Besides HTTPStatus::Forbidden, demonstrated above, WEBrick provides exceptions for each of the HTTP 1.1 status codes. The classes are not listed in the RDoc, but you can infer them from HTTPStatus::StatusMessage table. The class names correspond to the names given in the WC3 reference listed below.

Problem

Solution

Discussion

See Also