Joe Gregorio: Against an ultra-liberal RSS parser.

But if we all build parsers that accept non well-formed XML then where is the motivation to fix those feeds? Where is the motivation for the developers of tools that produce non well-formed RSS to fix their products? If it is no longer XML than I can’t use off-the-shelf XML parsers nor can I stuff the feed through an XSLT transform. If it ceases being valid XML then it is not as amenable to the wonderful re-purposing that the internet allows.

All excellent points, except for one thing: I’m building a parser that will be used in an end-user product, and end users don’t care about RSS or XML or XSLT or standards of any kind. Nor should they care, nor should they be required to care.

Look at it this way: imagine you made a browser that only rendered sites authored in valid HTML or XHTML. How much of the web would your users be able to see? 1%? 0.1%? Less? My site validates (most days — but not all days, because sometimes I accidentally add invalid content and don’t fix it until later), and a few other sites authored by fellow anal-retentive markup freaks. But your users wouldn’t be able to read any of the top 9 news sources, because none of them validate:

  1. CNN – invalid
  2. MSNBC – invalid
  3. ABCNews – invalid
  4. BBC – invalid
  5. FOXNews – invalid
  6. CNBC – invalid
  7. CBSNews – invalid
  8. NewsMax – invalid
  9. NPR – invalid

Having said that, I should reiterate that developers should absolutely care about standards, because they have all sorts of benefits for developers. For instance, I have gone to great lengths to code this weblog (and my other sites) to standards: it is valid XHTML 1.0 Strict, it is valid CSS, it complies with all published accessibility standards as well as I understand them, its feed is valid RSS 1.0 (as far as I can tell, though it’s impossible to be sure, since there are no validators for RSS 1.0 capable of catching errors like the one Bill Kearney caught, and yes, I’ve known about Leigh Dodd’s validator for several months now, but it doesn’t catch that sort of error either). The benefits to me, as a developer, are direct and immediate. Using pure CSS for layout allows my templates to be dead simple and easier to maintain and enhance, and it allows me to use semantic markup in ways that enhance accessibility (like using real headers). Using valid RSS 1.0 allows me the flexibility to extend my feed to include licensing information, link information, subscription information, and all kinds of other fun stuff which is so bleeding edge and convoluted that it could only realistically be read by a pure RDF parser that required valid RDF.

And you know what? End users don’t care about any of that. Even I, as the sole end user of my own ultra-liberal RSS parser (which I use in my homegrown RSS-to-email news aggregator), don’t care about any of that. At least not when I’m being an end user. In fact, one of the motivations for writing my parser in the first place was Aggie’s insistence on XML validity, which meant that I missed days at a time worth of news from Boing Boing, The Register, and other sites I follow whose feeds are less than perfect. Let me repeat that. I. Was. Missing. News. For days at a time. And I didn’t even realize it (because I didn’t scroll down in Aggie’s display to notice the error message). As a developer, I sympathize with your position. As an end user, I consider your position a fatal design flaw.

Postscript: if you want to evangelize, Syndic8 maintains a list of broken RSS feeds. You can even become a Syndic8 user and volunteer to help fix them. There are currently 1366 feeds awaiting repair.

If you want to evangelize within your program, I recommend doing something like iCab does. When it encounters a page authored in valid HTML, it puts a little green smily icon next to the address bar. News aggregators could do something similar: try to parse the feed with a real RDF or XML parser (and indicate success with a smily), and only fall back to a more liberal parser if the strict parser failed (and indicate this with a frown). But you must fall back to something, or you’ll just end up punishing your own users for the mistakes of web developers, which accomplishes nothing.

§

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



§

firehosecodeplanet

© 2001–9 Mark Pilgrim