dive into mark

You are here: dive into markArchivesOctober 2002More on evolvable formats

Thursday, October 10, 2002

More on evolvable formats

There’s a good debate brewing about evolvable formats over on XML-DEV.

  1. Dave Winer starts.

  2. Mike Champion replies. XML is overkill for Really Simple Syndication formats that don’t need the hierarchical and recursive structure that XML supports, because without the draconian error handling and namespace well-formedness constraints … XML is just a verbose way of labelling text values.

  3. Bryan continues. How about I want to use your RSS for some sort of Mobile phone display, AvantGO, or what have you, but your encoded HTML prevents me - or more likely I just parse your encoded HTML out (which is what I will do with any feed I get with encoded HTML in it) and your feed runs the risk of becoming nonsensical…

  4. I replied privately to Bryan and asked him to forward my reply to the list. He did not, but he did reply to my reply and quoted most of the relevant bits of my message, so you can read both there.

    I say: [Entity-encoded HTML in description] is probably the #1 most hated feature of the RSS 0.9x/2.0 format among RSS consumers/developers (since it means much more work for them to strip out HTML tags they don't want, or format things intelligently in an all-text display, or whatever... and I know firsthand how much this sucks, because I've done it). ... But entity-encoded HTML is also probably the #1 most loved feature of RSS 0.9x/2.0 among RSS content producers.

    Bryan replies: I have a difficult time making this distinction between a content producer and a consumer/developer. If you produce content you want the content consumed, a good producer would thus give some thought to want content consumers can consume.

    I have not yet seen a direct response to this point, so let me say here and now that no, in my experience, RSS content producers pay virtually no attention at all to what their content looks like or how difficult it is for content consumers to consume. Many are oblivious that they are producing content at all.

  5. Meanwhile, Mike Champion continues wondering why RSS is XML at all. If you’re rolling your own tools and not leveraging XML tools, ask yourself what value XML offers you or your users.

    A valid (if moot) point (the decision was made years ago by a company that no longer cares), but then he veers off track and completely misses the point: Why on earth would one even THINK about using entity-encoded non-well-formed HTML in a syndication format??? Use the HTML tags, but close them! Use tidy to clean up the junk you get from your users!

    That’s an easy one: because it’s the easiest possible solution for content producers. Few publishing systems produce valid (X)HTML, whether they be $42,000-per-processor content management systems or $39.95/free/open source weblogging systems. (Virtually) nobody knows about, cares about, or could properly use Tidy. Requiring valid markup to produce rich content would kill the format. I’m not happy about it either, but there it is.

  6. Bryan continues bashing entity-encoded HTML. Probably people who know html and know that they want such and such a line of text to be bold. And since RSS 0.92 allows them to get it bold via entity-encoded HTML they do so. … I would be surprised if I saw a bigger company doing it - but then again when I was on Adobe’s website recently with my newest build of Mozilla and they told me to get a new browser that understood javascript so who knows what the biggies will do.

    Heh, you obviously haven’t viewed-source on microsoft.com recently, or tried validating it. Actually, it’s impossible to validate, since it doesn’t declare a DOCTYPE, but even with DOCTYPE override, it seems to be stuck between two worlds. It uses XHTML-ish minimization syntax for single-tag elements like META and IMG, so it doesn’t validate as HTML, but it uses exclusively uppercase tags, so it doesn’t validate as XHTML. I feel like that guy in The Princess Bride trying to figure out which cup has iocaine powder in it…

  7. Joe Gregorio steps in and nails the point about imperfect tools: The other aspect is that many people implementing RSS may not have read the RSS spec (never mind the XML spec) they’re just using an example RSS file as boilerplate. Again, another ‘tools’ issue. Paraphrasing a conversation I had with another developer when he was talking about creating an RSS feed: ‘I thought to my self, I could do this the *right* way and use the DOM API in my scripting language and have it take me an hour, or I could just use printf and be done in 10 minutes. I did the printf thing, it’s just a blog.’

    Actually, I was present at that discussion (it was at our last RTP Bloggers Lunch), and I believe the person actually said I could do this the right way and use the DOM API in my scripting language and have it take me all night

Filed under ,

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



Recent Stuff For You, Special Price Stay Here
  • Greasemonkey Hacks
Good Stuff Buy The Cow Go Away
Dive Into Python
Powered by Google Drink The Milk Don't Steal

 

posts / comments
© 2001-8 Mark Pilgrim