Section 7.2. The Atom Entry Document in Detail


7.2. The Atom Entry Document in Detail

Now that you have the building blocks of Atom, let's move on to the details. We'll first look at the standard elements of an Atom entry document.

Atom entry documents not only make up the bulk of an Atom feed but are also used as the transport for the Atom Publishing API and as a format for web site archives. For example, using the Atom entry document format as an archive template for your weblog seems an increasingly good idea.

7.2.1. The Elements of Atom Entry


entry

Within an Atom entry document, the entry element is the root, which must have a version attribute to denote the version of Atom you are deploying. This book is based on the draft-05, whose version identifier is draft-ietf-atompub-format-05: do not deploy. Subtlety isn't its strong point, you have to admit. This element may also contain any number of XML namespace declarations for the use of other XML vocabularies. I cover this in Chapter 11.

If the entry is part of a feed document, this element has no attributes. Either way, the remainder of the elements are all children of entry.


title

The title element is a Text construct that gives the title of the entry. The entry must have one, and only one.


link

link, a Link construct, gives details of related URIs. There must be at least one with a rel attribute of alternate, but there can't be more than one of these with the same type value. This most commonly points to the HTML version of the resource, as with the link element in both flavors of RSS.

You can have as many link elements as you wish with a rel of something other than alternate. We'll talk about those later on in this chapter.


edit

The edit element is a Service construct pointing to the edit endpoint for this particular entry for use with the Atom Publishing API. You can only have one of these, but it is optional.


author

author is mandatory, unless the document is within a feed that has already declared an author for everything. It's a Person construct, denoting the primary author of the entry, and you can only have one of them. For multiple authors, you have to decide who the most important one was and demote the others to contributor. If necessary, fight.


host

host is optional and conveys the domain name, dotted IPv4 address, or IPv6 colon-delimited address associated with the origin of the entry document. Confused? Me too, until I saw it came from the need to give authorship details of posts from wikis. In many cases, the author of a wiki article isn't known by anything other than the IP address she posted from. This element is for that situation.


contributor

contributor is a Person construct, entirely optional and unlimited in number, that denotes a contributor to the entry. You must have an author before you start talking about contributors, however.


id

id is a URI construct that provides a URI for the entry. See the sidebar Sidebar 7-1 for more on this.


category

category is a category construct that provides a category for the entry document.


updated

A Date construct, the updated element must be present, once, within an entry. It denotes the last time the content changed in a way that the producer deems significant. So, you don't need to change this if you're fixing spelling mistakes, for example.


published

published is a Date construct denoting "an instant in time associated with an event early in the lifecycle of the entry," according to the specification document. Basically, this means either when it was written or when it was made available to the public. These are different things, granted, but there is no way to tell the difference within Atom's standard elements as yet.

You can, curiously, also set the published element to a value in the future. This suggests to applications that the entry shouldn't be displayed until that time, but applications don't have to pay any attention to this and can go ahead and display it anyway. No manners, some people.


summary

The summary element, in brief, is a Text construct that gives a short summary or extract of the entry. It's optional if there is a content element, and, like the Highlander, there can be only one.

If there is no content element, summary is mandatory. The summary is also mandatory if the content has an src attribute and is therefore empty, or if the content is encoded in Base64. As for that, we're just getting to it.


content

The concept of content within an Atom entry document differs slightly from that within an RSS feed. Within Atom, as with RSS, you can include the content directly within the entry document, but you can also just link to the content placed within a different file. (Although, as detailed later, you're discouraged from doing this with text content.) Furthermore, you can include any form of content (inside the feed, or linked to externally) and not just text or HTML.

The content element is its own construct, consisting of two attributes, type and src, and its own content:

  • type may be either TEXT, HTML, or XHTMLfollowing the same rules as the Text constructor if none of these things, it must be a valid MIME media type as per RFC 2045. If the type attribute is missing, it is considered to be equal to TEXT with all of the ramifications detailed for the earlier Text construct.

  • The src attribute may be a URI, which the application may dereference to retrieve the content. If the src attribute is present, the content element must be empty, and the type must be a MIME type and not TEXT, HTML, or XHTML. The MIME type returned by the server providing the resource is definitive, however. In other words, the feed might say something is x, but if the server says its y, you should treat it as y.

    Finally, if the value of type begins with "text/" or ends with "+xml", the content should be part of the feed as far as possible.


copyright

This is a Text construct that conveys copyright information for the entry. It's optional, and only one can be present. If it's not there, the copyright of the feed document takes over. If it is there, it takes precedence over that of the feed.

7.2.2. The Atom Feed Document in Detail

So, armed with handfuls of entry documents, we can make a feed. Feeds have their own elements too. Here they are:


feed

The feed element is always the root element of a feed document. Like the enTRy element within a standalone entry document, it takes a single attribute, version, which in the case of this version of the specification equals draft-ietf-atompub-format-05: do not deploy.

Everything is a child of this element. It takes two children directly, one head, and zero or more entrys, containing the entry documents.


head

The head element is a container for the metadata of a feed. The rest of the elements in this section are children of this head element. It may also contain properly namespace-qualified elements from other XML vocabularies, as you'll see in Chapter 11.


title

A Text construct giving the title of the feed. It is mandatory.


link

As with its namesake within the entry document, link is a Link construct, giving details of related URIs. If there is no content element within an entry, there must be at least one link with a rel attribute of alternate. There can't be more than one with the same type value. link is most commonly used to point to the HTML version of the resource, as with the link element in both flavors of RSS.

We'll talk about the other types of link later on in this chapter.

If a feed's link rel="alternate" element resolves to an HTML document, then that document should have an autodiscovery link element that reflects back to the feed. We discuss this in Chapter 9.


introspection

The introspection element is a Service construct giving the URI of a site's introspection file. It's optional, and you can only have one.[1]

[1] The idea of an introspection file is also a matter of debate. It is used with the Atom API and is a separate file containing the URIs of the Atom API endpoints for all the sites within that domain, for each of the API methods. There is no current standard for the introspection file, and perhaps there never will be. Certainly, the presence of the post edit elements take much of its place. As I keep stressing, in using the Atom standards, you are on the bleeding edge of syndication technology, which is itself built on the bleeding edge of publishing technology. It's not inconceivable that things will drop off every so often.


post

This element is a Service construct that conveys the URI used to add entries to the feed, using the Atom Publishing API. It's optional, and, yes, only one is allowed.


author

As with the Atom entry document, the author element is a Person construct to denote the primary author of the feed and the entries found within it. As noted in the entry document section, the person denoted by feed/head/author is overruled by anyone denoted by feed/entry/author. However, if the majority of your entries are authored by the same person, use of this element saves time. Either way, unless all your entries have their own author element, it is mandatory. You can, naturally, have only one.


contributor

Basically, this is the same as the author element, it's used only to denote any other authors. The rules of precedence are exactly the same as those for author.


category

category is a category construct that provides a category for the entire feed document.


tagline

A Text construct giving a description or tagline for the feed. Optional; only one is allowed; brevity and wit are appreciated.


id

An Identity construct giving a unique, permanent identifier for this feed. The feed's URI, in other words. It's optional, but you can have only one.


generator

An optional element denoting the software used to create the feed. This is useful for statistics and for error tracking. You can have only one of these elements, obviously. The specification document puts it succinctly:

The content of this element must be a string that is a human-readable name for the generating agent. The element may have a "uri" attribute whose value must be a URI. When dereferenced, that URI should produce a representation that is relevant to that agent. The generator element may have a "version" attribute that indicates the version of the generating agent. When present, its value is unstructured text.


copyright

A Text construct conveying human-readable copyright information for the entire feed and all its entries except those that contain their own copyright element. It's optional, and the feed itself can have only one. It shouldn't be used to convey machine-readable information.


info

This is a Text construct giving a human-readable explanation of the format itself. It's optional and really just a place for people to leave notes to other developers. It isn't meant to be used by any application and is only viewable if you look directly at the source.


updated

A Date construct, the updated element must be present, once, within a feed. As with the entry document equivalent, it denotes the last time the content changed enough for the publisher to want readers to know about it.

So there you go: the entire makeup of an Atom feed, as of January 2005. Again, be aware that Atom is a changing specification. I am judging, perhaps wrongly, that the specification won't change radically from the one detailed hereand if it does, you are now in a fine position to understand the changesbut before you deploy the format in anything resembling a permanent manner, go and check the latest documents.

Atom documents should be served under a MIME type of application/atom+xml with the file extension .atom.


Alternate link Types

The link element allows entry and feed documents to be linked to others. The most common is to use the rel="alternate" attribute to point to an HTML version of the content within the Atom document. This, indeed, is mandatory.

There are many other types of link currently proposed within the Atom community. None of them are, as yet, part of the actual specification itself, and none of them are, again as yet, supported by the common newsreader applications. However they are popular, and the debate is raging around them.

The current guideline is that "implementations MUST consider the link relation type to be equivalent to the same name registered within the IANA Registry of Link Relations Section 10, and thus the URI that would be obtained by appending the value of the rel attribute to the string `http://www.iana.org/assignments/relation/.'" The value of "rel" describes the meaning of the link but doesn't impose any behavioral requirements on implementations.

The discussion around this can be tracked on the atom-syntax list, and a series of examples can be found on the wiki at http://intertwingly.net/wiki/pie/LinkTagMeaning.

The upshot at the moment is that under draft-05, only two types of link are allowed: alterate and related.

A link rel="related" element, the only other type allowed, can appear only within entries, and should point to either web pages or other Atom entry documents that are related in some way to the entry in hand. For example:

<link rel="related" type="text/html" href="http://www.example.org/related.html" 
hreflang="en"/>


7.2.3. The Simplest Possible Thing That Will Actually Work

What, therefore, is the simplest possible Atom feed document? Technically speaking, you don't need to have any entries at all, but that's as close to useless as you're allowed to get. Assuming one entry, Example 7-3 shows the simplest possible Atom feed. If your feed is missing any of these elements, it is incomplete.

Example 7-3. The simplest possible Atom feed, with one entry
<?xml version="1.0" encoding="utf-8"?>
<feed version="draft-ietf-atompub-format-05: do not deploy" 
 xmlns="http://purl.org/atom/ns#draft-ietf-atompub-format-05">

<head>
<title>The Simplest Feed</title>
<link rel="alternate" type="text/html"   href="http://example.org/index.html"/>
<author><name>Ben Hammersley</name></author>
<updated>2004-10-25T15:07:02Z</updated>
</head>

<entry>
<title>The Simplest Entry Document</title>
<link rel="alternate" type="text/html" href="http://example.org/example_entry"/>
<author><name>Ben Hammersley</name></author>
<id>http://example.org/2004/12345679</id>
<updated>2004-10-25T15:07:02Z</updated>
<content type="TEXT">Simple Simple Simple</content>
</entry>
</feed>