<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3"
  xmlns="http://purl.org/atom/ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xml:lang="en">
  <title mode="escaped">dive into mark</title>
  <link rel="alternate" type="text/html" href="http://diveintomark.org/"/>
  <link rel="next" type="application/atom+xml" href="http://diveintomark.org/xml/2004/05/index.atom"/>
  <modified>2004-05-04T04:07:50Z</modified>
  <author>
    <name>Mark Pilgrim</name>
    <url>http://diveintomark.org/</url>
  </author>

  <tagline mode="escaped">Every time you use TypeKey, God kills a kitten.</tagline>
  <id>tag:diveintomark.org,2004:3</id>
  <generator url="http://www.movabletype.org/" version="3.0b2">Movable Type</generator>
  <copyright mode="escaped">Copyright &#169; 2004, Mark Pilgrim</copyright>
  <entry>
    <title mode="escaped">Another crack at user-friendly feeds</title>
    <link rel="alternate" type="text/html" href="http://diveintomark.org/archives/2004/05/02/user-friendly-feeds"/>
    <modified>2004-05-03T13:18:26Z</modified>
    <issued>2004-05-02T21:50:39-05:00</issued>
    <id>tag:diveintomark.org,2004:3.3698</id>
    <created>2004-05-03T01:50:39Z</created>
    <summary mode="escaped">I&apos;m styling my Atom feed with CSS.  No, not for aggregator users... for browser users. (613 words)</summary>
    <dc:subject>CSS</dc:subject>
    <content type="text/html" mode="escaped" xml:base="http://diveintomark.org/archives/2004/05/02/user-friendly-feeds">
      <![CDATA[<p>View <a href="http://diveintomark.org/xml/atom.xml">my Atom feed</a> in a real browser, where by <q>real</q>, I mean <q>Mozilla-based</q>.  (That's not fair; it works flawlessly in the latest version of Opera too.)  You should get a page that looks very much like the rest of my site, with a friendly little blurb at the top explaining a little about syndication and what this <q>feed</q> thing is that you just clicked on.</p>

<p>This is not an original idea; Blogger does something similar for all of their feeds.  In fact, the blurb at the top is the <code>info</code> element, which was <a href="http://www.shellen.com/sandbox/atom-info-proposal.html">originally proposed by Jason Shellen</a>, a Blogger employee, and later refined in Atom 0.3 to make the content model match other Atom elements.  Their blurb points users to the Blogger Knowledge base, which seems reasonable.  The wording of mine is open for discussion.</p>

<p>What is new here, I believe, is the use of inline XHTML (properly namespaced, of course; <a href="http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fdiveintomark.org%2Fxml%2Fatom.xml">this is still a valid Atom feed</a>) to add a few other things to the page for browser users.  The page header is a series of <code>&lt;div&gt;</code>s (the images and positioning are entirely defined in the associated CSS file, just like the rest of my site), and the breadcrumb trail is also a piece of hard-coded XHTML wedged in the middle of the feed in the appropriate place.</p>

<p>I have tested this with a number of Atom-enabled aggregators, and none seem to have any problem with it.  Let me know if your Atom-enabled client misbehaves.</p>

<p>Note that the display doesn't look right in IE/Win.  I am shocked, <strong>shocked</strong>.  Also, the links don't work in IE (the breadcrumb trail should be clickable except for the last bolded link, and the letters in the page header should be clickable).</p>

<p>Instead of styling XML with CSS, another potential solution would be to associate it with an XSLT transform and let the client convert it to HTML.  This would probably solve the cross-browser-CSS problem, since I could just transform it into exactly the same markup I use elsewhere throughout my site.  It would also create all new, even more exciting problems in trying to create cross-browser XSLT.</p>

<p><strong>Update:</strong></p>

<p>The entry titles aren't links in any browser, but they're not meant to be.  Some have suggested they should be, but I disagree.  I don't want to make this page *too* useful in a browser.  I do not, for instance, want people visiting this page all the time in their browser.  I just want to make it friendlier to first-timers (or accidental click-throughs) than dumping raw XML.  The one and only goal of the page is to get visitors to subscribe to the feed in a feed reader.</p>

<p>Others have noticed that I've switched from <code>application/atom+xml</code> to <code>application/xml</code> to get this demo to work.  Yeah, that sucks.  <code>application/xml</code> is correct, in the sense that it's not <em>wrong</em>.  It's not <em>optimal</em>, but at least it's not <code>text/xml</code>.  Oh God, let's not have that discussion again.</p>

<p>I'm surprised no one has pointed out the irony of my using inline XHTML at all, given my strong and well-publicized opinion of XHTML for general use.  In fact my HTML is virtually XHTML anyway, except for unclosed <code>&lt;img&gt;</code> tags.  I may employ some quick regular expressions and inline XHTML for the full content in my feeds, since Atom supports that and all Atom-enabled aggregators I've seen support that.  <a href="http://diveintomark.org/archives/2003/08/29/semantics">As I've mentioned before</a>, this is the only real use I've seen for XHTML.  And then I could display the full content on my styled-for-browsers Atom feed.  (<a href="http://intertwingly.net/blog/index.atom">Sam does this</a>.)  Not sure if that would be an improvement worth making.</p>]]>
      
    </content>
  </entry>
  <entry>
    <title mode="escaped">Essentials</title>
    <link rel="alternate" type="text/html" href="http://diveintomark.org/archives/2004/05/01/essentials"/>
    <modified>2004-05-02T19:54:50Z</modified>
    <issued>2004-05-01T12:04:43-05:00</issued>
    <id>tag:diveintomark.org,2004:3.3695</id>
    <created>2004-05-01T16:04:43Z</created>
    <summary mode="escaped">This should not come as a raging shock to anyone, but I am becoming a technological curmudgeon.  I have developed ways of doing things with computers that work for me, and I find there are now a wide variety of things I am no longer willing to discuss doing differently. (1416 words)</summary>
    <dc:subject></dc:subject>
    <content type="text/html" mode="escaped" xml:base="http://diveintomark.org/archives/2004/05/01/essentials">
      <![CDATA[<p>This should not come as a raging shock to anyone, but I am becoming a technological curmudgeon.  I have developed ways of doing things with computers that work for me, and I find there are now a wide variety of things I am no longer willing to discuss doing differently.</p>

<ol>
<li><p><strong>Text editing: <a href="http://www.gnu.org/software/emacs/emacs.html">GNU/Emacs</a></strong>.  All non-cross-platform editors are out; I do serious text editing (read: writing 500 page books in Docbook for money) on too many platforms on a regular basis to learn new keyboard shortcuts and macros, or to try to remember where I am today.  All editors that don't work identically in console mode (over an ssh connection) are out for the same reason.  All non-macro-customizeable editors are out.  All non-syntax-highlighting editors are out (I work primarily in XML, XSL, CSS, HTML, Python, and recently Java).</p>
<p>That doesn't leave a lot.  I know you can script vi with Python, and I know you can use real arrow keys now, and I've tried vi and vim and gvim and Lemmy and all the rest; they just don't fit my brain.  I've tried <a href="http://www.xemacs.org/">XEmacs</a>, but its <kbd>dired</kbd> functionality works much differently than GNU/Emacs, and I use <kbd>dired</kbd> a lot, and I like how GNU/Emacs works better.  I did steal a bunch of the packaged editing modes from an XEmacs installation and throw them in my GNU/Emacs site-lisp, though, so thanks for that.</p>
<p>Reasons I would leave GNU/Emacs: XEmacs really is nicer in many ways, much more professionally packaged and maintained.  If I'm feeling brave, I might give it another shot and see if I can get its <kbd>dired</kbd> to make sense to me.  Oh, and an official Mac OS X port would be nice; I'm using this <a href="http://www.mindlube.com/products/emacs/">unofficial build of GNU/Emacs for Panther</a>, which works well today, but it's <a href="http://members.shaw.ca/akochoi-emacs/">no longer being maintained</a>.  *sigh*  There's always <a href="http://homepage.mac.com/nand/emacsbuild/">nightly builds</a> (thanks <a href="http://www.the-forgotten.org/">Michael</a>).</p>
</li>
<li><p><strong>Web browsing: <a href="http://www.mozilla.org/products/firefox/">Mozilla Firefox</a></strong>.  Again, all non-cross-platform browsers are out; I set up my preferences identically on all platforms and rely on those preferences wherever I am.</p>
<p>Reasons I would leave Firefox: some important UI feature, or lack of support for some emerging web standard.  I've seen all the emerging web standards, and I don't see any must-haves.</p>
</li>
<li><p><strong>Mail: <a href="http://www.mozilla.org/products/thunderbird/">Mozilla Thunderbird</a> over IMAP</strong>.  Identical cross-platform functionality is less of an issue here, since I don't spend that much time in my mail client (relatively speaking).  But IMAP is essential (too many computers, and all webmail systems suck, and the ones that don't suck have no data export), and Thunderbird's IMAP support is top-notch.  Mail.app is close, and it has better indexing and searching.  Every few months I take all the messages I no longer care about having easy access to from anywhere, and download them to my Mac via Mail.app via POP, stick them in folders based on year (I have mail going back to 1993), and those mbox files get thrown into the daily backup rotation and archived in multiple places.  A few times a year, I find a reason to search <em>all</em> my mail for something, and I do it in Mail.app.  Of course if my Mac is not convenient, I can ssh into it and grep through the mbox files manually.</p>
<p>Reasons I would leave Thunderbird: stopped being actively developed, while having showstopping bugs.</p>
<p>Reasons I would leave Mail.app: when GMail supports bulk import and easier SSL support (yes I know you can get it by manually changing the URL... every time you log in), I would consider using it as a searchable repository.  I would still use some local solution like Mail.app for permanent archiving.  Even Google won't last forever.</p>
</li>
<li><p><strong>XML parsing: <a href="http://www.xmlsoft.org/">libxml2</a></strong>.  It's <a href="http://xmlbench.sourceforge.net/results/benchmark200402/index.html">faster than anything</a>, has <a href="http://xmlbench.sourceforge.net/results/features200402/index.html">bindings for everything</a>, and is insanely conformant to all specifications it claims to support.  Also, it integrates with <a href="http://www.gnu.org/software/libiconv/">libiconv</a> to handle every character encoding ever.  Why doesn't everyone use iconv?  Character encoding is a solved problem.</p>
<p>Reasons I would leave libxml2: none.</p>
</li>
<li><p><strong>Remote connections: ssh</strong>.  An obvious one, but I'm surprised how few people know how much it can do.  I set up SSH <em>all the time</em> to forward random ports from one machine to another.  I tunnel rsync over ssh.  I tunnel cvs over ssh.  I send outgoing mail by forwarding port 25 to my mail server (this is in fact the only way to send mail through my mail server; I've set it up to only accept outgoing mail from local users).</p>
<p>Reasons I would leave ssh: none.</p>
</li>
<li><p><strong>Backup and mirroring: rsync</strong>.  <kbd>rsync -essh -rtpvz</kbd> rocks.  Really, there's nothing more to say.  Learn it.  Use it.  Love it.  Here's a good rsync anecdote: in my last job, I worked on a project that was doing daily (and sometimes more-than-once-per-day) builds of a 100 MB installer.  Near the end of the release cycle, we were putting each daily build on a private web server for the client to download and test.  Uploading the entire build took over a hour on my capped DSL line.  It turns out that the fastest way to do this is to ssh into the server, duplicate yesterday's build to a file with today's date, then rsync today's build up to the server.  rsync magically figures out which parts of the installer have changed (usually not more than a few KB) and synchronizes the build in under a minute.  I have no idea how it does that.  I read once that it was somebody's PhD project.  Thank God for smart people.</p>
<p>If you're mirroring stuff from one Mac to another, and you care about resource forks, install <a href="http://www.macosxlabs.org/RsyncX/RsyncX.html">RsyncX</a> on both machines and breathe normally.</p>
<p>Reasons I would leave rsync: none.</p>
</li>
<li><p><strong>Web server: <a href="http://httpd.apache.org/">Apache</a></strong>.  IIS is out because it only runs on Windows.  I'm not interested in specialty web servers (I've tried, for internal projects, and always eventually needed enough functionality that we switch to Apache, or wish we had).  I'm not interested in proprietary add-ons or derivatives of Apache, although if you have to extend a web server, Apache is a great choice.  I just don't need to.</p>
<p>Reasons I would leave Apache: none.</p>
</li>
<li><p><strong>Server: Debian GNU/Linux</strong>.  I've tried RedHat, Mandrake, Gentoo, and FreeBSD, but the fanatical devotion of the Debian package maintainers makes the difference.</p>
<p>Reasons I would leave Debian: stopped being maintained.</p>
</li>
</ol>

<p>Of course, there are also a wide variety of things for which I have no strong allegiance to one tool over another.</p>

<ol>
<li>instant messaging (Currently using Trillian on Windows, GAIM on Linux, Adium on Mac OS X.)</li>
<li>IRC chat (Currently using Chatzilla, which sucks because it means installing the entire Mozilla suite and throwing away 60 MB for an IRC chat client, but my new laptop has more then enough memory, so what the heck. <strong>Update:</strong> <a href="http://www.hacksrus.com/~ginda/chatzilla/">Chatzilla on Firefox</a>.  Woohoo!  Gotta love a domain name like <code>hacksrus.com</code>.  And with a <a href="http://diveintomark.org/archives/2002/10/04/history_of_the_tilde">tilde in the URL</a> to boot!)</li>
<li>photo editing (Currently using GIMP and Paint Shop Pro on Windows, GIMP on Linux, and an old version of Photoshop on Mac OS X.  I have very light photo editing requirements, so it's not worth upgrading to the latest version of Photoshop.)</li>
<li>music management/ripping/burning (Currently using iTunes on Windows and Mac OS X, but I've never bought anything from the iTunes Music Store, so no lock-in.)</li>
<li>syndicated feed reader (Currently back to using <a href="http://www.bloglines.com/">Bloglines</a> after paying for <a href="http://www.bradsoft.com/feeddemon/index.asp">FeedDemon</a> and being underwhelmed by its Atom support and the general lack of development... yeah <a href="http://nick.typepad.com/blog/2004/week12/index.html">I know that's rude</a>, and I wish Nick the best, and I hope he's back on his feet and improving what could be the best feed reader in the world.)</li>
<li>weblog publishing tool (Currently using Movable Type 3.0 beta, plus lots of macros and customizations like a custom URL scheme, so there's a relatively high switching cost.)</li>
<li>source control (Currently using CVS, looked at Subversion last November, and again when it hit 1.0, and they had showstoppers like the inability to use digest authentication, forcing me to choose between basic authentication (which is no better than CVS pserver) or giving all my CVS users local accounts and tunneling over ssh (which is no better than CVS ext).  Yeah, I know about atomic commits, and they're nice, but I'm not doing large-scale development, and rollbacks and tagging are a low priority for me.)</li>
<li>scripting language (Currently using Python, but I'm bored, or maybe just exhausted from writing a book on it, so I'm idly looking around for My Next Programming Language.)</li>
</ol>]]>
      
    </content>
  </entry>
  <entry>
    <title mode="escaped">Universal Feed Parser 3.0 beta 22</title>
    <link rel="alternate" type="text/html" href="http://diveintomark.org/archives/2004/04/19/feed-parser-beta-22"/>
    <modified>2004-04-20T03:31:10Z</modified>
    <issued>2004-04-19T23:30:00-05:00</issued>
    <id>tag:diveintomark.org,2004:3.3648</id>
    <created>2004-04-20T03:30:00Z</created>
    <summary mode="escaped">I had a flash of insight and suddenly the entirety of Python&apos;s Unicode support became clear to me.  I coded madly for several hours until it faded.  It&apos;s entirely possible that that&apos;s just the LSD talking. (865 words)</summary>
    <dc:subject>Python</dc:subject>
    <content type="text/html" mode="escaped" xml:base="http://diveintomark.org/archives/2004/04/19/feed-parser-beta-22">
      <![CDATA[<p><a href="http://diveintomark.org/projects/feed_parser/feedparser-3.0-beta-22.zip">3.0 beta 22</a> of my <a href="http://diveintomark.org/projects/feed_parser/">Universal Feed Parser</a> is out.  This release fixes all known bugs, and I hope it will be the last beta before 3.0 final.  After all, this is getting a bit ridiculous.</p>

<p>The release makes a significant change: if XML parsing fails due to character encoding problems, the parser will attempt to auto-determine the character encoding and re-parse with a real XML parser.  This is noted in the results as <code>results['bozo'] = 1</code> and <code>results['bozo_exception'] = feedparser.CharacterEncodingOverride</code>.  <code>results['encoding']</code> will contain the encoding that was actually used to parse the feed (not the original declared encoding).</p>

<p>This release makes another significant change: Unicode support for ill-formed feeds.  All individual data values will be returned as Unicode strings if they can be converted using the document's character encoding.  I had a flash of insight and suddenly the entirety of Python's Unicode support became clear to me.  I coded madly for several hours until it faded.  It's entirely possible that that's just the LSD talking, but thanks to the magic of open source, everyone can now share in my good trip.</p>

<p>This release also makes significant changes to internal classes.  If you were subclassing or accessing these classes, your code will likely break.  If you were just using the public <code>parse()</code> function, you will not notice any change.</p>

<p>My change reporting history has been lax throughout the 3.0 beta process, so I went back and recreated it from file timestamps, comments, and judicious use of <code>diff</code>.  Full user documentation is coming next.</p>

<dl>
<dt>3.0b3 - 1/23/2004 - MAP</dt>
<dd>
<ul>
<li>parse entire feed with real XML parser (if available)</li>
<li>added several new supported namespaces</li>
<li>fixed bug tracking naked markup in description</li>
<li>added support for enclosure</li>
<li>added support for source</li>
<li>re-added support for cloud which got dropped somehow</li>
<li>added support for expirationDate</li>
</ul>
</dd>
<dt>3.0b4 - 1/26/2004 - MAP</dt>
<dd>
<ul>
<li>fixed xml:lang inheritance</li>
<li>fixed multiple bugs tracking xml:base URI, one for documents that don't define one explicitly and one for documents that define an outer and an inner xml:base that goes out of scope before the end of the document</li>
</ul>
</dd>
<dt>3.0b5 - 1/26/2004 - MAP</dt>
<dd>
<ul>
<li>fixed bug parsing multiple links at feed level</li>
</ul>
</dd>
<dt>3.0b6 - 1/27/2004 - MAP</dt>
<dd>
<ul>
<li>added feed type and version detection, result["version"] will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized</li>
<li>added support for creativeCommons:license and cc:license</li>
<li>added support for full Atom content model in title, tagline, info, copyright, summary</li>
<li>fixed bug with gzip encoding (not always telling server we support it when we do)</li>
</ul>
</dd>
<dt>3.0b7 - 1/28/2004 - MAP</dt>
<dd>
<ul>
<li>support Atom-style author element in author_detail (dictionary of "name", "url", "email")</li>
<li>map author to author_detail if author contains name + email address</li>
</ul>
</dd>
<dt>3.0b8 - 1/28/2004 - MAP</dt>
<dd>
<ul>
<li>added support for contributor</li>
</ul>
</dd>
<dt>3.0b9 - 1/29/2004 - MAP</dt>
<dd>
<ul>
<li>fixed check for presence of dict function</li>
<li> added support for full Atom content model in summary</li>
</ul>
<dt>3.0b10 - 1/31/2004 - MAP</dt>
<dd>
<ul>
<li>incorporated ISO-8601 date parsing routines from xml.util.iso8601</li>
</ul>
</dd>
<dt>3.0b11 - 2/2/2004 - MAP</dt>
<dd>
<ul>
<li>added 'rights' to list of elements that can contain dangerous markup</li>
<li>fiddled with decodeEntities (not right)</li>
<li>liberalized date parsing even further</li>
</ul>
</dd>
<dt>3.0b12 - 2/6/2004 - MAP</dt>
<dd>
<ul>
<li>fiddled with decodeEntities (still not right)</li>
<li>added support to Atom 0.2 subtitle</li>
<li>added support for Atom content model in copyright</li>
<li>better sanitizing of dangerous HTML elements with end tags (script, frameset)</li>
</ul>
</dd>
<dt>3.0b13 - 2/8/2004 - MAP</dt>
<dd>
<ul>
<li>better handling of empty HTML tags (br, hr, img, etc.) in embedded markup, in either HTML or XHTML form (&lt;br>, &lt;br/>, &lt;br />)</li>
</ul>
</dd>
<dt>3.0b14 - 2/8/2004 - MAP</dt>
<dd>
<ul>
<li>fixed CDATA handling in non-wellformed feeds under Python 2.1</li>
</ul>
</dd>
<dt>3.0b15 - 2/11/2004 - MAP</dt>
<dd>
<ul>
<li>fixed bug resolving relative links in wfw:commentRSS</li>
<li>fixed bug capturing author and contributor URL</li>
<li>fixed bug resolving relative links in author and contributor URL</li>
<li>fixed bug resolvin relative links in generator URL</li>
<li>added support for recognizing RSS 1.0 in results['version']</li>
<li>passed Simon Fell's namespace tests, and included them permanently in the test suite with his permission</li>
<li>fixed namespace handling under Python 2.1</li>
</ul>
</dd>
<dt>3.0b16 - 2/12/2004 - MAP</dt>
<dd>
<ul>
<li>fixed support for RSS 0.90 (broken in b15)</li>
</ul>
</dd>
<dt>3.0b17 - 2/13/2004 - MAP</dt>
<dd>
<ul>
<li>determine character encoding as per RFC 3023</li>
</ul>
</dd>
<dt>3.0b18 - 2/17/2004 - MAP</dt>
<dd>
<ul>
<li>always map description to summary_detail (Andrei)</li>
<li>use libxml2 (if available)</li>
</ul>
</dd>
<dt>3.0b19 - 3/15/2004 - MAP</dt>
<dd>
<ul>
<li>fixed bug exploding author information when author name was in parentheses</li>
<li>removed ultra-problematic mxTidy support</li>
<li>patch to workaround crash in PyXML/expat when encountering invalid entities (MarkMoraes)</li>
<li>support for textinput/textInput</li>
</ul>
</dd>
<dt>3.0b20 - 4/7/2004 - MAP</dt>
<dd>
<ul>
<li>added CDF support</li>
</ul>
</dd>
<dt>3.0b21 - 4/14/2004 - MAP</dt>
<dd>
<ul>
<li>added Hot RSS support</li>
</ul>
</dd>
<dt>3.0b22 - 4/19/2004 - MAP</dt>
<dd>
<ul>
<li>map 'channel' to 'feed', 'items' to 'entries' in results dict (old keys still work)</li>
<li>changed results dict to allow getting values with results.key as well as results[key]</li>
<li>work around embedded illformed HTML with half a DOCTYPE</li>
<li>work around malformed Content-Type header</li>
<li>if character encoding is wrong, try several common ones before falling back to regexes (if this works, bozo_exception is set to CharacterEncodingOverride)</li>
<li>fixed character encoding issues in BaseHTMLProcessor by tracking encoding and converting from Unicode to raw strings before feeding data to sgmllib.SGMLParser</li>
<li>convert each value in results to Unicode (if possible), even if using regex-based parsing</li>
<li>re-added mxTidy support, but off by default; install mxTidy and set feedparser.TIDY_MARKUP=1 to enable it</li>
</ul>
</dd>
</dl>]]>
      
    </content>
  </entry>
</feed>
