Every few months, somebody floats the idea of doing away with RSS and replacing it with HTML or XHTML, because semantic markup is all we need
. Last July, Joe Gregorio asked Why do we need RSS?
Brad Wilson responded with a few good reasons (all of which still apply). Joe floated an XHTML+RSS specification anyway, which was discussed and withdrawn and now seems to have gone 404.
Now, like heartburn that persists 2 or more times a week even though you’ve treated it and changed your diet, the idea is back again. Ban RSS! XHTML is all we need! It’s like, uh, semantic and stuff.
- Anil Dash: syndication formats?
- Scott Andrew: Semantic overloading.
- Tantek Çelik: XHTML vs. the world and More on XHTML syndication.
- Stuart Langridge: XHTML instead of RSS.
- techno-weenie: XHTML syndication.
- Paul Freeman: The XHTML syndication debate.
- Aaron Swartz: The RSS Rebellion.
- Ian Hickson: Why semantic markup is so important.
- Sam Ruby: Extensibility and convergence.
- Shelley Powers: RSS Push Back.
I’ve talked about application posture before; it seems to me that this latest movement
adopts the wrong posture. The entire success of RSS is predicated on the principle that you can keep doing whatever messed up stuff you’ve always done on your web pages… oh, and do this other thing too. Look, it’s simple, you can code it up in an hour with a few
By contrast, this latest XHTML-as-syndication movement seems to be based on the principle that print statements and an escape function.syndication is so incredibly important that you must immediately stop whatever you’re doing with your web pages, upgrade to XHTML, validate your markup, restructure your home page to include all and only the content you’re willing to syndicate, and by the way, would you please unlearn that ugly nasty presentational page layout language you’ve been using for years and learn this wonderful happy structured semantic markup language instead?
It should be obvious to any rational observer that this will go nowhere fast. A syndication format that requires valid semantic XHTML markup? Spare me. 9 out of 10 bloggers can’t even spell XHTML. This has all been tried before (Dan Brickley wrote an XHTML-to-RSS Extractor service over 2 years ago), and went nowhere, because no one’s markup is up to snuff. Even the more liberal services that work via regular expressions (such as Julian Bond’s RSSify) are only used as a last resort by people who can’t produce an RSS feed any other way.
Then there’s the bandwidth issue. Think RSS doesn’t save bandwidth? If-Modified-Since headers to the rescue? Think again. I scraped Tantek’s page; the resulting RSS feed was 13K. His home page is 78K. Even assuming a best case scenario of an unchanging home page with proper If-Modified-Since or ETag headers, are you telling me that whenever he posts something new, I’ll have to download 6 times as much data because semantic XHTML is all we need
? Thanks for nothing.
To make things worse, many people have dynamic home pages, or pages that change frequently (constantly being updated with comment counts, trackback counts, pingback listings, linkbacks, or other external content). A single subscriber banging away once an hour on an ever-changing 78K home page will generate almost 2MB of traffic per day. That same subscriber banging away once an hour on a 13K RSS feed that only changes twice a day will generate about 30K of traffic.
Think this is a strawman argument, pitting the best case on one side against the worst case on the other? My own home page uses extremely optimized markup; it’s currently 68K, but it changes every hour because I include my further reading
linkbacks inline. My RSS feed, with exactly the same number of posts, including both custom-written plaintext descriptions and full posts in <content:encoded>, is 38K, and it only changes when I post new articles. I get 10,000 hits a day on my RSS feeds. Do the math.
(Let us also pass over in silence the use cases that these rebels
don’t want to think about. Shelley only wants to provide excerpts in her feed, to minimize wholesale republication, but wants to publish full posts on her home page. Large commercial sites generally only want to syndicate excerpts because they want to drive page hits and banner ad revenue. Dean Allen provides hand-crafted descriptions in his feed (an excellent technique which I have since adopted). Mena Trott has experimented with RSS-only content. Lambda The Ultimate uses external links instead of permalinks for the item/link, thus breaking the rel="bookmark" semantics. (Mena does this too on some of her feeds.) I have custom RSS feeds, like my further reading feed that lists recent referrers, that contains data that has already fallen off my home page or was never there in the first place. I include item-level dates in my RSS feed but not on my site. And how do we specify dates in XHTML anyway? Aaron’s RSS: XHTML Profile says the element with
but doesn’t mandate a date format, making it worthless. Think you can write intelligent code to handle them all? Dorothea Salo displays her dates in Latin. The list of exceptions and limitations goes on and on.)class="date" is the date
Syndication is not publication. Your syndicated feed is not your home page plus a little, or your home page minus a little, or your home page in plain text. It’s something else, a different medium. Maybe less, maybe more, but different. Syndication has its own stengths, its own weaknesses, its own best practices. And of all the syndication formats we could be using, RSS is the worst one possible. Except, of course, for all the others.
I’ll tell you what, though: clean semantic markup does make it easier to scrape content from stubborn publishers that don’t provide their own RSS feeds. For those who don’t care about the quality of their feeds and are willing to take the bandwidth hit (you must not pay for bandwidth!), you’ll be happy to know that next-generation newsreaders will have screen scrapers built in. RssDistiller is a third-party product that already does this for Radio; NewsMonster allegedly has the functionality built-in (although I never did get it to work, it’s pre-alpha); and a little birdie told me that the next version of Aggie will have some sort of screen-scraping plugin architecture too. Page layout changed and my scraped feed broke overnight? My next-generation newsreader will contact a central server (or a network of servers) and automatically download an updated plugin. If you want control over your syndicated feeds, now would be a good time to start providing your own; otherwise you’ll be viewed as a nuisance, a maintenance headache, and people will route around you.

