11.2 REST Principles | Programming Web Services with Perl

There are 12 principles, shown in Table 11-2. Most are either self-explanatory or will be covered later in this chapter.

Table 11-2. REST principles

Group	Principle
Resources	(1) A resource is anything that has identity. (2) Every resource has a URI. (3) A URI is " opaque ," exposes no details of its implementation.
Protocol	(4) `GET` operations are "idempotent," free of side effects. (5) Any request that doesn't have side effects should use `GET` . (6) All interactions are stateless.
Representations	(7) Data and metadata formats are documented. (8) Data is available in multiple flavors. (9) Representations include links to other resources.
Style	(10) Document and advertise your service API. (11) Use available standards and technology. (12) Refine and extend architecture, standards and tools.

The first group of principles relates to standardized addressing (URI), which describes how to address resources. The second group of principles relates to standard application protocol (HTTP), which describes what operations can be applied to the resources. Stateless interaction in this context means that each request from client to server must contain all the information necessary to execute the request, and can't take advantage of any stored context on the server. The third group of principles relates to standard resource representation (HTML, GIF, JPG, XML-based vocabularies), which describes what data is passed to resources or accepted from them. Standardization of all these key elements of the REST approach allows for the achievement of widespread interoperability, and an ability to rapidly and independently evolve clients , servers and systems as a whole.

11.2.1 Aesthetics of URI Design

Uniform resource locators (URLs), like uniform resource names (URNs), are a subset of the general class of uniform resource identifiers (URIs). The distinction between URIs, URLs, URNs, and the rest of resource identifiers is often more confusing than useful. ^[2] In this chapter, the term "URI" refers to all of them.

^[2] If you're interested, the paper "URIs, URLs, and URNs: Clarifications and Recommendations" describes the relationship among the concepts and the reason for the confusion. Check it out at http://www.w3.org/TR/2001/NOTE-uri- clarification -20010921/.

11.2.1.1 URI syntax

Before going into detail about resource modeling, let's first review and elaborate on URIs as discussed in Chapter 2. An absolute URI reference consists of three parts : a scheme , a scheme-specific part, and a f ragment identifier . Some schemes are hierarchical, allowing for both relative and absolute URIs ( http: ) and some aren't, allowing only absolute URIs (such as mailto: ). For hierarchical namespaces, a scheme-specific part is broken down into authority , path , and query components. Relative URI references can be created by omitting the scheme and authority components (they are implied by the context of the URI reference). These forms of URI reference syntax are summarized as follows :

 <scheme>:<scheme-specific-part>#<fragment> <scheme>://<authority><path>?<query>#<fragment> <path>?<query>#<fragment>

11.2.1.2 Resource modeling

The strength and flexibility of REST comes from the pervasive use of URIs. REST is about exposing resources through URIs, not services through messaging interfaces. On this view, URIs are acted on by HTTP methods , and the result of those actions is to transfer representations of some resource from the origin server to the client that initiated the request. Some REST advocates call the process of bringing an application into this model "resource modeling," i.e., how to model a given problem domain as a set of resources. Each component of the URI should be a resource for which operations make sense. One guiding principle is to ask "does it make sense to use GET , PUT , POST , or DELETE operations on this named resource?"

Most URIs are opaque to client software most of the time. In other words, a public API shouldn't depend on the structure of the URIs; the structure of the URI is irrelevant to client software. URI opacity opens the door to new URI schemas and different interpretations of URI spaces. It isn't possible to satisfy opacity requirements in all cases: URIs that include fragment identifiers aren't opaque to the client because the fragment identifier is evaluated on the client. For HTML, the fragment ID is an ID of an element within the HTML object (anchor). For XML, if it is just a word, it is the XML ID of an element in the document.

Putting any kind of method name , action, or process in a URI is typically considered poor style in REST (because resources are objects rather than processes or services, URI parts should be named for nouns rather than verbs). Using a name to identify a process is RPC-like; using a name to identify a processor is REST-like. The difference is important, because a processor can conceivably have its own state. It would seem that this is an academic debate about naming styles and conventions, but it is actually more important than that: at its core , this debate is about what a resource is or can be, and what underlying models and design processes should be, to achieve REST-fullness. Decomposing a problem domain into a set of resource representations with a generic interface isn't the same as decomposing the same domain into a set of objects and type-specific operations on those objects. There is a mismatch between REST and current process of modeling problem domains as system of objects.

Imagine a description of a simple model with two domain entities: mailbox and message. Even when the model is restricted to representation as a hierarchical collection of resources, there are still a number of options to explore ( pavelkulchenko is the name of mailbox and 567 is the unique number of the message in the mailbox):

/mailbox=pavelkulchenko/message=567

/pavelkulchenko/567

/mailboxes/pavelkulchenko/messages/567

/mail?mailbox=pavelkulchenko;message=567

Even though the choice may not be immediately clear, for a number of reasons that will be covered later, this approach is the favored one:

/mailboxes/<mailboxId>/messages/<msgId>/

Using it, you can model resources. The following part refers to a collection of mailboxes:

/mailboxes/

This part refers to an instance, i.e., a particular mailbox:

/mailboxes/<mailboxId>/

This refers to a collection of messages:

/mailboxes/<mailboxId>/messages/

This refers to an instance, i.e., a particular message:

/mailboxes/<mailboxId>/messages/<msgId>/

Extending into the message themselves , you'd use the following to refer to a fragment of the message:

/mailboxes/<mailboxId>/messages/<msgId>/#from

This refers to a set of messages filtered from a collection:

/mailboxes/<mailboxId>/messages/<msgId>/?after=20020831

You can continue and include parts/<partId> to address MIME parts of the message or include first/last instead of <msgId> to address the first or last messages in a mailbox.

The reason that the slashes have been instituted as the common universal syntax for a hierarchical boundary is their familiarity . Hierarchical schemes are common, and the relative naming within hierarchical space has many advantages. Relative naming allows small groups of documents that are located closely within a tree to refer to each other without being aware of their absolute position within any absolute tree.

It's easy to say that URIs change because of the lack of forethought (and it's probably true in many cases), but it is important to realize that many properties of URIs are social rather than technical in nature. Creating URIs that can be used in one year or ten from now requires thought, organization, and commitment on the part of authorities assigning URIs.

There is nothing about HTTP that makes URIs unstable. URIs change when there is some information in them that changes ”authors' names, status, corporate structure, database design, and project name are all things that can change over the course of time and thus change a URI. We're in the bind that we need information in the URI to access the resource it identifies, but the very act of putting information in the URI makes it fragile. The only way out of the bind is to include the minimum information that is enough to identify a resource.

It is important to emphasize that a URI points to a resource as a concept, rather than a document. Don't expose the mechanism of how a server runs ( cgi-bin , servlet , or mod_perl URIs). Make a change to the mechanism (even though the content stays the same), and all the URIs change. It's highly unlikely that, if you follow these principles, you'll come up with a URL such as http://www.foo.org/cgi-bin/articles.pl? newsid =1234567 . Instead, it is more likely that if you follow REST principles you'll get something more like http://www.foo.org/articles/1234567 .

This much space has been devoted to URIs because it's extremely important to properly understand and implement resource names.

11.2.2 Methods

The HTTP specification provides a number of generic methods (the HEAD , GET , and POST methods were discussed in Chapter 2), but only five are relevant to this discussion: GET , HEAD , POST , PUT , and DELETE . Table 11-3 recalls the operations from Chapter 2 and defines the new ones.

Table 11-3. REST methods

Method	Meaning
`GET`	Retrieves a resource
`HEAD`	Retrieves representation and resource metadata
`POST`	Inserts, updates, or extends a resource; may change the state of other resources
`PUT`	Creates, updates, or replaces a resource
`DELETE`	Deletes a resource

Unifying the method vocabulary provides tremendous opportunities for simplifying interactions. It is precisely because HTTP has few methods that HTTP clients and servers can grow and be extended independently without confusing each other. Essentially , what are needed are methods that correspond to the "CRUD" concept: Create, Retrieve, Update, and Delete. In HTTP they are called GET , POST , DELETE , and PUT .

11.2.2.1 GET method

To get a representation from a resource, a client uses the HTTP method GET . This operation is idempotent , which means that a client may use the result of a previous operation instead of repeating it; the state of the server shouldn't be changed in ways that are visible to the client. GET is restricted to a single URL line, which enforces one of the design principles: everything interesting should be URL-addressable. Wanting to create a system in which resources aren't URL-addressable means needing to justify that decision.

11.2.2.2 POST method

Modifying a resource uses the POST operation. This meaning of POST can be ambiguous. It can mean append, create, remove, or modify a portion of resource, or something else entirely. This ambiguity leads to a number of misuses. For example, in violation of one of the REST principles, it can be used for data queries without side effects.

A web-based address book from a well-known vendor illustrates this point. Users can submit the query with an arbitrary string using the POST method and submit the query with a predefined unique identifier using the GET method. If a client wants to link to the address book using the name of a given person, it isn't directly possible (a client can use a scripting language to submit the form, but this isn't always feasible ). This is an example of design that violates REST Principle 5: any request that doesn't have side effects should use GET .

Here's a good test of REST-fulness: can an application do a GET on the same URLs it POST s to, and if so, does it get something that in some way represents the state of what it has been building with the POST operations?

11.2.2.3 DELETE method

To delete a resource, use the DELETE method on a valid URL. ^[3]

^[3] You may note that the DELETE method is supposed to be used to delete the resource. However, the POST method can delete resources too. Consider a situation in which a mailbox has to be deleted along with all messages. That operation would be executed using POST , rather than DELETE , because more than one resource is involved. At the same time, it is possible to DELETE a resource that has multiple representations. For instance, a DELETE /image operation may delete GIF, PNG, and JPG representations of the resource.

11.2.2.4 PUT method

To change the representation of a resource, a client uses the PUT method. There is a subtle different between the POST and PUT methods. To distinguish between the two, consider a simple example that will be described in greater detail later in this chapter. To get an image of a cover page, a technical editor may use a service that creates the image from a template based on a number of parameters (such as title, cover, or animal); whereas a graphical designer may create the same image in a program such as Photoshop and submit it.

The editor uses POST to submit parameters; the server application modifies a resource based on the submitted parameters and returns the URL of the modified resource (which might be different from the URL the request was submitted to). The designer uses PUT to submit the image; the server modifies the representation. Both can update a resource or create a resource if there is none, but the difference is that PUT (and DELETE ) operates on a representation as a whole, where POST submits information that causes a server to modify, create, or delete the resource (or multiple resources), often with a URL that is different from that submitted.

A PUT request may also include a Content-Range header to request modification of only a portion of the entity. It may also include the If-Match and If-None-Match headers to indicate which various entity versions to modify. Thus, including a header, If-None-Match: * , in a PUT request allows a new entity to be created only if it doesn't already exist.

11.2.3 Security

REST greatly simplifies security and benefits it in a sociological manner. Where RPC protocols try as hard as possible to make the network look as if it isn't there (such as the SOAP::Lite module's support of the autodispatch mode, which tries to make remote calls look local), REST requires that the software developer design a network interface in terms of methods and resources. Whereas RPC interfaces encourage clients to view incoming messages as method parameters to be passed directly and automatically to programs, REST requires a certain disconnect between the interface (which is REST-oriented) and the implementation (which is usually object-oriented).

Resource-centric web services are inherently firewall-friendly: only four main operations are permitted (not counting HEAD ). Server configurations can apply the four basic permissions to each data object ( GET , POST , PUT , and DELETE ), and they mean what they say: GET means get, and DELETE means delete. Many REST proponents believe that it will be impossible to securely and widely deploy a protocol across administrative boundaries without knowing the precise meaning of the methods. Using the hierarchical nature of the resource URI, servers may allow or disallow specific operations on subresources.

HTTP authentication and authorization are topics that most web developers are already familiar with. It's easy to assume that a service can't hide on the Internet, but the sole fact of making a resource addressable doesn't make it accessible. For instance, servers can create cryptographically unguessable URIs or associate unguessable signatures with each URI to insure that the resource gets accessed only from a legitimate source. Unguessable URIs are essentially a form of security mechanism known as a capability . Capability security is simple, but it requires software and end-users to adhere to a discipline that sharing a URI is equivalent to sharing the resource (a widely used service that employs this method is QuickTopic, available at http://quicktopic.com/).