Well, the-format-who-must-not-be-named (Pie? Echo? c’mon, help name it!) seems to be gathering steam. Sam hosted a long discussion about permalinks and unique IDs; a consensus majority seem to think that both should be required, but no final decision has been made.

Ken MacLeod makes an excellent point about Dublin Core: even if you don’t use the actual Dublin Core namespace or the actual Dublin Core element names, you can define your own element by saying that is a particular Dublin Core concept (like dateCreated, or dateModified, or whatever other dates may make it into the model). These concepts have been well-defined for years, and you can re-use the concepts without re-using the syntax. Don’t know if that would be worth doing (we haven’t gotten to physical modelling yet), but I just thought it was interesting. Dublin Core rocks.

Also, Ben and Mena have expressed interest and said that if/when this format becomes something concrete, they would consider supporting it into a future version of Movable Type. Blogger is also listening, and the growing list of supporters includes Brad Fitzpatrick of LiveJournal, Greg Reinacker of NewsGator, and Tim Bray. (Yes, that Tim Bray.) Tim also wrote a nice summary of why we need a new format at all, which is an important question that, as far as I’m concerned, has been satisfactorily answered.

Not to get ahead of myself, but perhaps I should start blocking off some time to work on a new validator. You know, in my copious free time. ;)

§

Thirty four comments here (latest comments)

  1. I didn’t see any consensus that both should be required. I don’t think anyone has justified why there needs to be both a permaLink and a separate concept of a unique ID. If so I’d like to see a clean division as to where they belong in the scheme of things (syndication vs. comment posting vs. main entry posting).

    Anyway, it’s not like I care one way or the other. :)

    If I could suck it up and write code that handles RSS 0.91, 1.0 & 2.0, some new format with redundancies and funky rules of its own are just more of the same. I’d just like this to be over with as soon as possible so I can unfreeze development of RSS Bandit.

    — Dare Obasanjo #

  2. Remember when this new format is done it also should respect the non blogging world who does read you. From an XML purist aspect I fail to see how dublin core elements would be any different than you guys just defining those elements yourself. That "dc:" thiong. is that necessary? I realize it is a standardbut it is just as bad as Dave W putting "dw:" in front of everything.

    — Don Ulrich #

  3. Good point Dare, I’ve updated the entry to read “a majority seem to think … but no final decision has been made”.

    — Mark #

  4. I’m confused by Sam’s Roadmap.

    Are we talking about a syndication format, or a weblog editing protocol? There is no good reason why the syntax for one ought to resemble the syntax for the other (and many good reasons why they ought to be different).

    — Jacques Distler #

  5. “There is no good reason why the syntax for one ought to resemble the syntax for the other”

    Except simplicity and familiarity.

    — Anil #

  6. “Except simplicity and familiarity.”

    … at the expense of functionality.

    To pick one example (which has come up already), how should we specify a post’s content?

    In a weblog editing protocol, the best answer is “let it be an arbitrary string of bits”. The weblog software at the other end of the wire needs to know how to interpret that string of bits, but this need pose no constraint on the protocol.
    You can blog using Textile formatting, I can throw in itex equations, and we can both use exactly the *same* protocol.

    For a syndication format, “an arbitrary string of bits” sucks. You really need a rigid specification because *somebody else’s* aggregator needs to be able to render that content.

    — Jacques Distler #

  7. Jacques, are you aware of several of the aggregator and client-side blogging tools that support posting to multiple weblog-vendor’s tools? They allow one to post to their own weblog or to other weblog’s comments. That’s where it makes sense to converge on weblog editing and comment posting protocols.

    — Ken MacLeod #

  8. Re. the post’s content, the answer is “any media type they want”, it could be a photo, a movie, an audio track, HTML, or any string of bits the particular tool likes to see. The format being defined allows for all of them, and lets the tool validate what it can accept.

    — Ken MacLeod #

  9. Don, reread Mark’s post and the linked comment. There’s nothing to say there has to be “dc:” things.

    The difference is, Dublin Core has spent almost ten years paring down and providing the simplest, clearest definition of fifteen concepts for describing resources on and off the web. Check them out in this short document: http://dublincore.org/documents/dces/

    If you then wonder, “what about ???”, then continue with the refinements: http://dublincore.org/documents/dcmi-terms/#H3

    The important thing is, we can use these concepts as-is, or use them in assisting to define our own.

    — Ken MacLeod #

  10. Am I aware?

    Absolutely. I use one (Kung-Log) for all my posting.

    It’s great that the same APIs work with multiple weblogging systems. But they do, precisely because certain things are “underspecified”.

    Comments are a different story. Even though Mark and I use the same weblogging system, comments on our blogs are, syntactically, extraordinarily different.

    Comments on my blog have “Subject” headers, are *threaded*, can be composed in a variety of formats, and must pass XHTML Validation before they can be posted.

    Comments on Mark’s blog are not threaded, must be plain text (with line-breaks converted to <p>s and <br>s), and — last I heard — you could always sneak in the odd unescaped &.

    And that’s two MovableType users…

    Syndication is yet another story, but here, alas, the problems are more political than technical.

    — Jacques Distler #

  11. Yes, I am with Ken on DC usage. See:

    http://radiocomments.userland.com/comments?u=112479&p=619&link=http%3A%2F%2Fwww.docuverse.com%2Fblog%2Fdonpark%2F2003%2F06%2F23.html%23a619

    — Don Park #

  12. I would hate to see Dublin Core stuff used as a starting point to define tags that would mean essentially the same thing. DC exists, it works well, it defines a worthwhile set of metadata, and I don’t understand what purpose forking from it would serve. Does the fact that some items will have a dc: in front of them really make them harder to understand? Does anyone not understand what dc:date.created means, for example?

    — ralph #

  13. Rant Mode=On (Mark, if you find this innapropriate, I’ll take it elsewhere)

    I’ve been following this conversation over various blogs and the wiki for awhile, and more than anything else the vibe I get is one of royal terror or abhorrence for existing standards or really anything that others have put work into.

    What I’m talking about:

    There are too many syndication formats. Solution? “Let’s create yet another one!”

    RDF, the foundation of the semantic web? “Eek! Scary!”

    RSS 1.0? “Auuugh! Based on scary RDF! Plus it brings back bad memories of the Flaming Times! No!”

    RSS 2.0? “No! Dave Winer will eat us all!”

    Dublin Core? “What the hell do those mooks know? We don’t need no stinkin’ standards”

    XLink? I haven’t even heard any discussion of this, even though it’s directly applicable to this whole postID/permaLink discussion.

    In fact, with the exception of XML itself, it sometimes seems like anything associated with the W3C or the semantic web effort are anathema. I’ve rarely heard such blatant statements of NIH as I’ve seen in postings about this project. I’m sorry about the tone of this message, it just gets to me because I’m in the planning stages of writing my own CMS and having been amazed at the amount of work the W3C has already done for me in defining useful standards for semantic web applications, I’m foreseeing having to implement Yet Another Syndication Format that is informed by little if any of that work.

    — Avdi #

  14. I prefer the concept of a single format for editing, archiving and syndicating log entries. And I’d like it to be simple enough for a technical moron like me to be create an entry in a text editor, store it as a file on a web server and serve it as (X)HTML using a *osxom-type weblog tool.

    — Arthur #

  15. Will this be like a W3 “Recommendation?” Sure MT/blogger/et. al can put it in their default templates, but a lot of people modify those to their own tastes. People running weblogs now will have to bring their sites “up to speed” much like people trying to move from “tag soup” to valid HTML. Are we headed towards a time of “blog soup?”

    Will there be multiple “recommendations?” Will I have to specify a BlogType to tell tools which “recommendation” I’m following? Oh the humanity! ;-)

    — Patrick Berry #

  16. Avdi, most likely because we can’t all read every spec. I only just realized the pure depth of thought that went into Dublin Core researching it for this effort. I stopped reading “every” spec around RFC1700 :)

    For example, I didn’t realize (and haven’t confirmed myself) that XLink made statements about resource identity in the face of multiple potential reference URIs, only that it supports typing and relating of resources.

    It’s efforts like these, especially ones that are so open and available for organizing related info, that it requires people who know “stuff” to do their part in getting that “stuff” out there.

    Another example is RDF. I have a pretty solid expectation that this effort will not be defined using the RDF model. As Norm Walsh said, “Fair enough.” What I do expect, though is for people familiar with RDF to help make sure things that would make interoperability with RDF better, simpler, or easier to speak up about it — RDF applications will be among the users of Echo.

    Specific to RDF, which I haven’t jotted on the wiki yet:

    * Namespaces should end with a word delimiter, like ‘/’ or ‘#’. This doesn’t impact other XML namespace-based tools or recommendations, that I’m aware of.

    * RDF translators or XSLT should be linked from and/or developed on the Echo wiki. Ie. all your Echo information in one place.

    * It should be stated that use of Echo namespaces within RDF is a recognized usage.

    * Correspondingly, RDF developers, as users of Echo, should indicate where the content models of Echo elements could be more simply or discretely defined for interoperation without content tranlation. XSLT users would probably have the same concerns.

    — Ken MacLeod #

  17. I may be naive and speaking from a Marketing viewpoint, but it seems to me that by the time everyone agrees on a new format and implements it, Microsoft will have eaten everybody’s lunches.

    Real people (users) don’t care about standards. They just want something that solves their problems NOW. If you’re going to beat Microsoft, you better agree on something soon or you’re going to end up looking like the Iraqi army.

    — Vincent Flanders #

  18. Avdi: most of your post was completely appropriate; please take the mocking part elsewhere.

    Your comments about Dublin Core baffle me. We are well aware of it, and I fully expect we will use it, as appropriate, in one form or another, once we get down to physical modelling and spec writing.

    In general, I find little justification for your claim that no one in this effort is aware of prior art. A simple search of the wiki to this point finds multiple references:

    http://www.intertwingly.net/wiki/pie/FrontPage?action=inlinesearch&text_full=prior+art&context=40

    Also keep in mind that “respecting prior art” does not mean “doing everything the W3C says”. It means being aware of the work that has been done (in the W3C and elsewhere) and making informed decisions as to whether to apply that here, and how.

    This initiative does not want to repeat the mistakes of RSS 2.0, but we also do not want to repeat the mistakes of RSS 1.0. The designers of RSS 1.0 were convinced (at the time) that RDF was just beginning to flourish and that a full-blown Semantic Web was just around the corner. That was 3 years ago, and we’re still waiting.

    I am hoping that this initiative will be much more conservative. All aggregators and CMSes support consuming/generating plain XML, so that is probably a safe choice (as opposed to another serialization format, like YAML, or key-value pairs separated by carriage returns). Not all aggregators support namespaces, so probably the safe choice is to put all the required elements in a single namespace and make it the default namespace (so non-namespace-aware parsers can get all of that). The multitude of optional application-specific extensions can go in namespaces.

    Bleeding-edge recommendations like XLink (yeah, I know, July 2001, but where are the implementations? and the W3C can’t even decide whether to use it themselves, cf. HLink vs. XLink debate in XHTML 2.0) — stuff like that is probably out from the get-go. Betting the farm on unproven technology is one of the mistakes that RSS 1.0 made. Let’s not make the same mistakes again. Let’s make all new ones! ;)

    — Mark #

  19. Avdi: I just re-read your message. I had originally thought you said “I just wrote my own CMS”. What you actually said was “I’m in the planning stages of writing my own CMS”. Now that makes sense.

    Come back in a few months when you’re finished implementing it, and then maybe you’ll understand why nobody actually uses all those wonderful-looking specs the W3C pumps out.

    — Mark #

  20. Thanks for your reply, Mark. I think more was cleared up in that reply than in any number of “why we need a new format” justifications I’ve read before now. I still don’t fully understand what the problem with RDF is; it seems like the reason the semantic web hasn’t fully bloomed is *because* efforts like this decide not to use it… and the reason given is at least partly because it’s not in wide use. Which seems circular to me. It’s not like using RDF/XML requires aggregators to have specialized RDF libraries available to them; basic SAX will work just fine.

    On the other hand, it hadn’t even occurred to me that parsers lacking namespace support would be taken into account; I can see where that would severely limit the use of things like RDF and XLink in the core spec.

    Some of my statements may have been in reaction to isolated comments I’d read that don’t reflect majority opinion, like the one about Dublin Core. I guess it’s not clear to me whether DC will be supported in a handwavy “this element corresponds roughly to the Dublin Core Author element”, or in more rigorous ways that can be exploited by software (e.g. ontology mappings).

    As far as “bleeding-edge recommendations like XLink” - well, I’m not sure how newness makes something bad or useless. With namespaces out the window you probably can’t implement Echo links in terms of xlink, but a lot of thought went into that spec which seems to me to be directly applicable. XLink could inform the discussion of linking every bit as much as Dublin Core can inform the subject of standard metadata.

    Say what you will about W3C standards, but the fact is that people write software libraries to support them, allowing other people to build software at the next higher level of abstraction. A lot of nifty software now exists that wouldn’t if XML hadn’t enabled us all to serialize structured data without thinking too hard about *how* it was serialized. And one thing that will determine whether this Semantic Web thing ever takes off is whether clever young hackers with new ideas for how to exploit the web have the tools available to them to work at this next higher level of abstraction, or whether they will get bogged down in writing parsers and models for myriad protocols and eventually give up. So far it seems like the focus of this discussion has been on making the format as easy as possible to generate and parse. I agree that if it’s too hard to write a generator, no one will use the format. But ultimately, only a very few people are going to be writing parsers and generators. The more impedence-matching that is needed to bridge from that writer-friendly format to, say, fully namespaced RDF with every element mapped to well-defined standard vocabularies, the less likely it is we’ll see emergent applications which exploit all this metadata we’re exposing to the world in ways we never thought of.

    Finally, a meta-comment: I understand the comment moderation policy around here (and somewhat expected to be moderated), but while it makes it clear what is unnacceptable without censoring, it doesn’t make it clear how to self-correct the offending comment. Is there any way for me to edit my comments after the fact?

    — Avdi #

  21. Hm. I’ve read some more on the Wiki and it looks like I may have ranted too quickly. I like wikis, but the one problem I’ve found with them is that it’s sometimes hard to find where the subjects that interest me are being discussed. Anyway, it looks like most if not all of my concerns have been addressed, or at least brought up, by others.

    — Avdi #

  22. Avdi, good followup. Just one nit: as someone who knows RDF and parsing it into simple structures, I can tell you that “basic SAX will work just fine” is not true. The graph nature of RDF means that the serialization is not predetermined as to where in the element tree RDF statements will occur.

    On the other hand, here is a simple, 10k RDF/XML -> triples parser, based on SAX:

    http://bitsko.slc.ut.us/2003/06/foaf-check/rdf.py

    and a module that takes the triples and creates a Py data structure from it (among checking the FoaF)

    http://bitsko.slc.ut.us/2003/06/foaf-check/FoafCheck.py

    — Ken MacLeod #

  23. The eminently sensible Edd Dumbill has eminently sensible things to say about this subject at ( http://usefulinc.com/edd/blog/2003/6/25#13:44 ).

    I haven’t seen this said explicitly on the Wiki (maybe it has been), but I’ve been wondering why RDF isn’t just piggybacked onto the syndication format. So: everyone comes up with the best, most formally specified syndication format that can be, with a consistent data model that doesn’t have anything to do with RDF, and then we say: “okay, there’s the format, now if you want to do complex, outside-of-the-spec kinds of metadata, just slot in your RDF within the document, and away you go.”

    Since it’s in a different namespace, a pure syndication client will ignore rdf:RDF stuff, but a Semantic Web-oriented syndication client will pick it up and do what it pleases. Now we have a simple syndication format and a well-specified metadata format, all in one, but the people who don’t want to worry about the well-specified metadata format can just go about their business.

    Maybe this leads to specifying things twice (i.e. published-on dates that you might want to express inside of RDF *and* inside of the syndication format), but a little redundancy for the sake of clarity seems to be fine, especially if things are gzipped. I think I’d rather see a redundant format than one which tries to pile everything together (per Edd’s comments, above). I’d rather program to a redundant format, too, if I can keep my semantics in a row and not shed hot namespace tears.

    In reading about W3C technologies: what’s frustrating to me is the overlap. I use XLink and RDF internally for a project and there’s huge shared space between the two; they’re crying out for some sort of unified approach: RDF with linking semantics, maybe, where xlink:arcroles are formally specified as RDF resources, although that sounds painful and complex. All are good standards, but their legacies (I think the origin of XLink was in the HyTime community; for RDF, the Knowledge Rep/AI community did most of the work) mean that, while both standards are about linking resources, their means of doing so are not conceptually unified–they CAN be conceptually unified, it may not be apples and oranges as much as oranges and tangerines. But, that said, I find this really frustrating; it’s like having to use SQL inside of Python, say, switching from one problem-solving approach to another. You get used to it, but I’d rather just access native data structures than skip around all the time.

    Sorry to drag on. I think I need to start a web site somewhere so I can write about these ideas at length.

    — Paul #

  24. Ken: Good point about RDF serialization, I’d forgotten that. I guess what I’m getting at is that one could come up with an XML schema that parsed OK as RDF, but was a bit more constrained than RDF.

    — Avdi #

  25. Avdi, your comment on wiki’s ‘line-of-sight’ problem is not uncommon which is why I consider Wiki to be a half-evolved form of communication.

    — Don Park #

  26. I’d like to see the format coalesce quickly (hopefully with a couple of popular implementations or plug-ins to existing software), but I think it’s only half the battle.

    I concur with Paul above that RDF and XPath need some duct tape in between, but beyond formats and referencing, behaviour and interaction are the next logical steps.

    I’m going to take a bullet (Matrix-style, I hope) and say that Trackback has to grow up - not that it isn’t a good idea (and I like the concept), but implementing it alongside SOAP or XML-RPC feels like parking a cart alongside an SUV - it’s kinda rickety, won’t accomodate extra data, and (despite being admittedly lighter and handier in tight curves) doesn’t feel that solid.

    Anyone up for defining a SOAP interface to blogs that would merge with the format? Like Mark, my spare time is heavily bound (to my desk, with an ACME 15-ton weight), but I can still think :)

    — Rui Carmo #

  27. Rui,
    Take a look at the CommentAPI, it is along the lines of TrackBack, but using RSS 2.0 as it’s serialization. Now that imagine updating the CommentAPI to use Echo.

    http://wellformedweb.org/story/9

    Same for the Blogger API, look at RESTLog.

    http://wellformedweb.org/news/5

    — Joe #

  28. Avdi,
    I have to second Mark’s advice, try implementing with the specs with the W3C first. Also hang out on the [xml-dev] mailing list, or maybe [www-tag], it is quite eye-opening. I too used to be under the impression that all the specs flowing from the W3C were all firmly thought out, steadily building on top of each other, all the interactions between them planned to the finest detail. In retrospect, that’s not a very healthy perspective.

    Anyway, that might be the plan, but it is far from the reality. Just look at the issues Paul and Mark brought up with the overlap between XLink, HLink and RDF. For some perspective check out the TAG issues list:

    http://www.w3.org/2001/tag/ilist

    The TAG being a group formed specifically to resolve these types of issues.

    — Joe #

  29. Incessant Ramblings (trackback)
  30. http://www.xulplanet.com/cgi-bin/ndeakin/homeN.cgi?ai=133

    — Dan #

  31. Life in the Zu (trackback)
  32. Life in the Zu (trackback)
  33. Is there an example out there on how “echo” looks/will look like? Thanks!

    — David Collantes #

  34. Here’s a first draft:

    http://www.intertwingly.net/blog/1506.html

    — Mark #

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



§

firehosecodemusicplanet

© 2001–8 Mark Pilgrim