Here is a working implementation of the Atom API. Well, part of it anyway. It supports introspection (to discover what functions are supported), creating new entries, editing existing entries, deleting entries, retrieving entries, and searching for entries. It does not support editing user preferences, getting or setting categories, or adding comments.

This implementation is entirely self-contained within a single CGI script and runs on a default Apache install with no .htaccess tricks. Hopefully this will clear up a number of misconceptions about REST APIs in general and the Atom API in particular, especially the one about it being impossible to implement as a single CGI script on a default Apache install with no .htaccess tricks. It is possible; I’m doing it.

The introspection file, which tells the client what functions this server supports, is pointed to by a LINK tag on the home page. My introspection LINK tag looks like this:

<link rel="service.edit" type="application/x.atom+xml" href="/cgi-bin/atom.cgi/service=edit" />

This tells us that the introspection file is at http://diveintomark.org/cgi-bin/atom.cgi/service=edit. Retrieving this URL with a normal HTTP GET returns this file:

<introspection xmlns="http://purl.org/atom/ns#">
  <create-entry>http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14</create-entry>
  <search-entries>http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/service=search</search-entries>
</introspection>

The create-entry line tells us that we can create a new entry by POSTing an Atom entry (as XML) to the URL http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14. This action requires authentication, which I’ll describe in a minute. (For the purposes of this demonstration, reading the introspection file does not require authentication, although real implementations might require it.)

The search-entries line tells us that we can get minimal information about entries (probably just title and id, although the server is allowed to return more information) using the URL http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/service=search. We can’t use this URL in isolation; it requires at least one query string parameter. Any of the following syntaxes are supported:

http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/service=search?atom-all
Returns information on all the entries
http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/service=search?atom-recent=5
Returns information on the 5 most recent entries. Takes any positive integer. Note that this is not part of Atom API draft 7, but it will be added in draft 8.
http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/service=search?atom-start-range=5&atom-end-range=14
Returns information on entry 5 through 14, inclusive on both ends. The most recent entry is 0, second-most-recent is 1, etc. May return fewer than the requested number of entries, if there aren’t enough entries (atom-end-range is out of range). This will not raise an error. May return an empty list of entries, if atom-start-range is out of range, or if atom-start-range is greater than atom-end-range. This will not raise an error either.

For the purposes of this demonstration, search results do not require authentication.

Search results look like this:

<search-results xmlns="http://purl.org/atom/ns#">
<entry>
  <title>Unit Test 1</title>
  <id>http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2412</id>
</entry>
<entry>
  <title>Testing</title>
  <id>http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2408</id>
</entry>
</search-results>

The server may, at its discretion, include more information than that in search results, but title and id are required.

To retrieve an existing entry, do an HTTP GET on the URL for that entry (as returned, for example, in the id for the entry in the search results). The URL scheme for entries is completely server-dependent; for instance, the examples in the spec have completely separate URLs for each entry, but my server implements them as parameterized URLs served off a single CGI script. The client has no way of knowing in advance what URL scheme the server will use, and it shouldn’t care, because all it has to do is look at the search results and use the URLs it’s been given.

For example, doing an HTTP GET on http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2357 returns this Atom entry:

<entry xmlns="http://purl.org/atom/ns#">
  <id>http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2357</id>
  <title>Automated post</title>
  <link>http://diveintomark.org/atom/archives/002357.html</link>
  <created>2003-08-12T23:53:03Z</created>
  <issued>2003-08-12T23:53:03Z</issued>
  <modified>2003-08-12T23:53:03Z</modified>
  <summary>An automated post</summary>
  <content type="text/html" mode="escaped">This is a test</content>
</entry>

For the purposes of this demonstration, retrieving entries does not require authentication.

To edit an existing entry, do an HTTP PUT on the URL for that entry (the same URL you would use to retrieve it), with the complete updated Atom entry in the body. (Editing existing entries requires authentication, which is described below.) So to change the title of the above entry, you would do an HTTP PUT on http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2357 with this in the body of the HTTP message:

<entry xmlns="http://purl.org/atom/ns#">
  <id>http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2357</id>
  <title>Edited post</title>
  <link>http://diveintomark.org/atom/archives/002357.html</link>
  <created>2003-08-12T23:53:03Z</created>
  <issued>2003-08-12T23:53:03Z</issued>
  <modified>2003-08-12T23:53:03Z</modified>
  <summary>An automated post</summary>
  <content type="text/html" mode="escaped">This is a test</content>
</entry>

The server may, at its discretion, ignore certain fields that are not user-editable. For example, some servers might not allow the user to change the creation date of a post (because the server maintains it internally), so even if the PUT message contained an <created> element, the server would not change the creation date of the entry. This is not an error; the server should update the fields it can update, and ignore the rest.

Vendors may add their own custom fields, in a namespace, that do not duplicate the functionality of core Atom elements. For example, Movable Type has a number of entry-level flags, such as whether to allow comments. This is not handled by the core Atom API, but SixApart could define a namespace to contain their vendor-specific flags, and clients that were aware of these flags could include them in the body of the message. Servers should ignore any unknown namespaces.

To create a new entry, do an HTTP POST on the URL given for <create-entries> in the introspection file. In this case, that was http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14. Put an Atom entry in the body of the HTTP message. If the creation succeeded, the server will respond with an HTTP status code of “201 Created”, and the URL of the new entry will be in the Location: header. For example, when I originally created the entry above, this is what the server returned:

Status: 201 Created
Location: http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14/entry_id=2357

To delete an existing entry, do an HTTP DELETE on the URL for that entry. On successful deletion, the server simply responds with an HTTP status code 200.

Now, I mentioned that several actions (creating a new entry, editing an existing entry, and deleting an entry) require authentication. In my implementation, authentication is handled with a variation of HTTP Digest Authentication (RFC 2617). It is important to note that, while HTTP Digest Authentication has been around for years, our variation of it is highly experimental. Draft 7 of the Atom API defines no authentication mechanism at all. This is a prototype of how we might implement an application-level authentication mechanism with the strength of HTTP Digest Authentication.

We decided to use a scheme like HTTP Digest Authentication because

  1. It never sends plaintext passwords over the wire.
  2. It does not require either client or server to store plaintext passwords on disk. They can both store a nonreversible (one-way) hash of the password in a specific format.
  3. It can protect against replay attacks, whereby an attacker can sniff one valid transaction and use that information to create another valid transaction, even though they don’t know your password.

However, we decided against relying on Apache’s implementation of HTTP Digest Authentication as-is because

  1. It requires an Apache module that is not installed by default, and that many people (including myself) do not have installed.
  2. It is handled entirely by Apache; no authentication-related information is passed on to the script. This means it must be maintained entirely with Apache-level tools (.htaccess and .htpasswd files), which is a major maintenance headache. Server-side API implementations do not exist in a vacuum; they are almost certainly part of a larger application which already manages users and passwords (for example, by storing them in a database). In an informal poll among vendors, the overwhelming consensus was that they wanted an application-level authentication mechanism so they could re-use their existing infrastructure that manages users and passwords.

So this implementation of the Atom API recreates HTTP Digest Authentication at the application level, with the following changes:

For those not familiar with the inner workings of Digest Authentication, here’s how Atom authentication works:

  1. The client tries to do something that requires authentication, for instance, POSTing a new entry to http://diveintomark.org/cgi-bin/atom.cgi/blog_id=14. The server sends back and HTTP error code of “447 Atom unauthorized”, and an Atom-Authenticate header like this:

    Atom-Authenticate: Digest realm="dive into atom", qop="atom-auth", algorithm="SHA", nonce="some unique server-specific value"

  2. The client takes the username, the realm given by the server, and the password, and concatenates them to create an intermediate value which we’ll call A1:

    A1 = username + ":" + realm + ":" + password

  3. The client takes the HTTP verb it wants to use (in this case “POST”) and the path part of the URL it wants to post to (in this case “/cgi-bin/atom.cgi/blog_id=14″), and concatenates them into an intermediate value which we’ll call A2:

    A2 = verb + ":" + uri

  4. The client creates a unique client-specific value, which we’ll call “cnonce”. How this happens is completely client-specific, but it should change on every request, and future values should not be guessable.

  5. The client takes A1, A2, the qop given by the server, the nonce given by the server, and the cnonce created by the client, and creates a digest, which we’ll call “response”:

    response = sha(sha(A1) + ":" + nonce + ":" + "00000001" + ":" + cnonce + ":" + qop + ":" + sha(A2))

    (Thanks to Sjoerd Visscher for pointing out, while the implementation was correct, the original version of this documentation contained a critical error. This is the updated version.)

  6. The client resends its original request, with the addition of an Atom-Authentication header with all of the following values filled in:

    Atom-Authentication: Digest username="...", realm="...", nonce="...", uri="...", qop="atom-auth", nc="00000001", cnonce="...", response="..."

  7. If the username/password is not valid, the server will respond with an HTTP error code 403, and a new Atom-Authenticate header, and the client starts all over.

    If the client screwed something up (forgot a value, sent a malformed authentication request), the server will respond with an HTTP error code 400, and a new Atom-Authenticate header.

    If the client successfully authenticated, the server will do what the client asked (in this case, post a new entry). Every subsequent response from the server may contain an Atom-Authentication-Info header that includes a “nextnonce” value. If present, the client must discard the previous nonce value and use the new nonce value to recalculate the digest response on the next request. (This protects against replay attacks.) The client should cache and reuse the other values given by the server (realm, qop, algorithm). Either way, only one extra round trip is required per session (before the first action, to get the initial authentication challenge). The client does not need to do additional round trips once they have successfully authenticated, as long as they stay current with their nonce values.

    If the server does not return a new nonce value, the client should continue using the old one, and increment the value of nc as a hexadecimal number, and recalculate the digest response. So on the second request, the client would recalculate the response like this:

    response = sha(A1) + ":" + nonce + ":" + "00000002" + ":" + cnonce + ":" + qop + ":" + sha(A2)

Update (2003-08-25): the algorithm has changed slightly in response to feedback from the PHP community and the Apache community. This is the new algorithm.

This authentication scheme may appear convoluted at first glance, but anything simpler would be vulnerable to a variety of different attacks. You can read RFC 2617 for all the details on those attacks, and how this system protects against them.

This authentication scheme is not yet part of the Atom API. It is a proposal for an authentication scheme that satisfies the requirements we’ve discussed. Other authentication schemes could also satisfy those requirements, but I believe that this one is the simplest thing that could possibly work.

Sample source code for this implementation (atom.cgi) is available in atom-api-20030818.py. Also, minidomhack.py, which works around a small bug in Python 2.2 (fixed in 2.3). This is Python source code that is specific to my environment (for example, it does direct SQL queries to my Movable Type database), so it is not readily deployable on other servers without serious hacking. This is my own implementation; is not associated with Movable Type or SixApart.

Sample source code for a GUI client that works with this implementation is available in wxAtomClient_18Aug2003. It requires Python and wxPython, and has only been tested on Windows.

Update (2003-08-25): updated client and server code is now available. See Joe’s latest post for details on the changes.

Discussion on RestEchoApiDiscuss, or atom-syntax.

§

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



§

firehosecodeplanet

© 2001–9 Mark Pilgrim