This is a usability rant. If you don’t like rants, or don’t care about usability, move on.

One of my biggest pet peeves about the current generation of news aggregators is that they don’t work very hard at finding RSS feeds. The other side of this coin is their stubborn insistence on making RSS visible. RSS should be completely invisible. Browsers make HTML invisible. They even make the URL invisible (by auto-completing the http://, and in many cases the www. and .com as well.) Yeah, from a technical standpoint, it’s cool that web sites offer their content in XML. But end users don’t care about XML. They just want to read stuff.

But to do that, they first have to subscribe. I’ve tried all the leading desktop news aggregators on the market, and the process of subscribing to sites is incredibly painful. Few aggregators support RSS autodiscovery, which would allow users to type diveintomark.org and subscribe to my RSS feed. Instead, they insist that I enter the address of that RSS feed directly. Some are even proud of this, and go to great lengths to teach me how to find RSS feeds myself. This is the opposite of usability.

Here’s what I want:

  1. If I give you the address of a feed, obviously you should take it. Fine.
  2. However, if I give you the main address of a site, you should download the home page and look for LINK tags that point to feeds. If you find any, use them. This was the entire purpose of RSS autodiscovery. Aggie gets this right. However, Aggie (at least RC3, I know the next version is forthcoming) insists that I preface the address with http://. Don’t do that. Browsers don’t do that.
  3. If the site doesn’t support RSS autodiscovery, don’t give up. Maybe the page has a regular link to its feed. (Many weblogs do.) Scan all the links on the page, and guess intelligently about which one(s) of them points to a feed.
  4. Links to addresses on the same server that end in .rss, .rdf, or .xml are prime candidates for being feeds. Gather them all up and see which ones are actually feeds. (Yes, this means downloading them and looking at them. You’re going to do that later anyway, so do it now. Don’t ask me about it, don’t boast about it, just do it. Do all the hard work so I don’t have to.)
  5. No luck? Look for links to addresses on the same server that contain rss, rdf, or xml anywhere in the address. Verify any you find.
  6. Still no luck? Repeat the previous two steps, in order, but expand the search to include addresses on external servers. (Many Blogger weblogs use a third-party service to RSSify their site.) Weed out 127.0.0.1 addresses. Verify any you find.
  7. Still no luck? Syndic8 keeps track of over 8,000 feeds, and it has an XML-RPC interface. See if it knows about any feeds associated with this stubborn address. Weed out any whose status is not Syndicated. This may also find scraped feeds for many major news sites, like CNN, that don’t provide their own. So much the better; most users would have never found those.

End of rant. Beginning of solution. rssfinder.py. GPL-licensed. It does all that.

§

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



§

firehosecodeplanet

© 2001–9 Mark Pilgrim