There’s a good debate brewing about evolvable formats over on XML-DEV.
Mike Champion replies. XML is overkill for Really Simple Syndication formats that
don’t need the hierarchical and recursive structure that XML supports,
because without the draconian error handling and namespace
well-formedness constraints …
XML is just a verbose way of labelling text values.
Bryan continues. How about I want to
use your RSS for some sort of Mobile phone display, AvantGO, or what
have you, but your encoded HTML prevents me – or more likely I just
parse your encoded HTML out (which is what I will do with any feed I get
with encoded HTML in it) and your feed runs the risk of becoming
nonsensical…
I replied privately to Bryan and asked him to forward my reply to the list. He did not, but he did reply to my reply and quoted most of the relevant bits of my message, so you can read both there.
I say: [Entity-encoded HTML in
description] is probably the #1 most hated feature
of the RSS 0.9x/2.0 format among RSS consumers/developers (since it
means much more work for them to strip out HTML tags they don’t
want, or format things intelligently in an all-text display, or
whatever… and I know firsthand how much this sucks, because
I’ve done it). … But entity-encoded HTML is also probably
the #1 most loved feature of RSS 0.9x/2.0 among RSS content
producers.
Bryan replies: I have a difficult time making this distinction between a content
producer and a consumer/developer. If you produce content you want the
content consumed, a good producer would thus give some thought to want
content consumers can consume.
I have not yet seen a direct response to this point, so let me say here and now that no, in my experience, RSS content producers pay virtually no attention at all to what their content looks like or how difficult it is for content consumers to consume. Many are oblivious that they are producing content at all.
Meanwhile, Mike Champion continues wondering why RSS is XML at all. If you’re rolling your own
tools and not leveraging XML tools, ask yourself what value XML
offers you or your users.
A valid (if moot) point (the decision was made years ago by a company that no longer cares), but then he veers off track and completely misses the point: Why on earth would one even THINK about using entity-encoded
non-well-formed HTML in a syndication format??? Use the HTML
tags, but close them! Use tidy to clean up the junk you get
from your users!
That’s an easy one: because it’s the easiest possible solution for content producers. Few publishing systems produce valid (X)HTML, whether they be $42,000-per-processor content management systems or $39.95/free/open source weblogging systems. (Virtually) nobody knows about, cares about, or could properly use Tidy. Requiring valid markup to produce rich content would kill the format. I’m not happy about it either, but there it is.
Bryan continues bashing entity-encoded HTML. Probably people who know html and
know that they want such and such a line of text to be bold. And since
RSS 0.92 allows them to get it bold via entity-encoded HTML they do so.
… I would be
surprised if I saw a bigger company doing it – but then again when I was
on Adobe’s website recently with my newest build of Mozilla and they
told me to get a new browser that understood javascript so who knows
what the biggies will do.
Heh, you obviously haven’t viewed-source on microsoft.com recently, or tried validating it. Actually, it’s impossible to validate, since it doesn’t declare a DOCTYPE, but even with DOCTYPE override, it seems to be stuck between two worlds. It uses XHTML-ish minimization syntax for single-tag elements like META and IMG, so it doesn’t validate as HTML, but it uses exclusively uppercase tags, so it doesn’t validate as XHTML. I feel like that guy in The Princess Bride trying to figure out which cup has iocaine powder in it…
Joe Gregorio steps in and nails the point about imperfect tools: The other aspect is that many people implementing RSS may not
have read the RSS spec (never mind the XML spec) they’re just
using an example RSS file as boilerplate. Again, another ‘tools’
issue. Paraphrasing a conversation
I had with another developer when he was talking about creating an RSS feed:
‘I thought to my self, I could do this the *right* way and use
the DOM API in my scripting language and have it take me an hour,
or I could just use printf and be done in 10 minutes.
I did the printf thing, it’s just a blog.’
Actually, I was present at that discussion (it was at our last RTP Bloggers Lunch), and I believe the person actually said I could do this the right way and use the DOM API in my scripting language and have it take me all night …
§
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–9 Mark Pilgrim