The first 90% of John Gruber’s And Oranges is excellent. Everyone should read it, and I’m not just saying that because it’s all about me. Unfortunately, the last 10% goes right off the rails, so naturally that’s where I’m going to start.
John writes:
And the truth is I’m not entirely sure he’s making the right decision, even for himself. Forget all the niggling details he cites, and focus only on his central beef — that Apple is a company that does not “get” openness, and that this deficiency is going to hinder Pilgrim’s long-term access to the data he’s creating. But if that’s the case, and Pilgrim has been using Apple computers for 22 years, why hasn’t it happened already?
It has happened already, John. Over and over again.
Years of hacking on an Apple //e, writing programs in Applesoft BASIC, Apple Pascal, and 6502 assembly language. All for a platform that doesn’t exist and can only be emulated with the help of ROMs which are illegal to redistribute. Years of writing bad poetry, short stories, and letters in AppleWriter and later AppleWorks. At one point in the distant past I bought a copy of MacLinkPlus and converted them all (not 100% faithfully) to Word, and later converted those (again, not 100% faithfully) to RTF. (How can you fuck up converting text files, I hear you ask. Well, I had this e.e.cummings thing going for a couple of years, in which whitespace was supremely significant. Conversion sometimes lost that, or muddled it. This also helps to explain why I fell in love with Python instead of Perl, but never mind that. Then there’s the whole character encoding problem.)
Years of hacking on various Macs, including a Mac LC, Mac IIci, and PowerMac 8500. All targeted at OS 6 through 9, using Apple-specific toolkits and libraries. None run natively in OS X and therefore will not run on modern Intel Macs (or any other platform). They can only be emulated with the help of ROMs which, once again, are illegal to redistribute. Years of writing school papers, newspaper articles, more bad poetry, and half a novel in a never-ending stream of word processors (WriteNow, MacWrite, ClarisWorks, Word, etc). I managed to convert some of these to modern formats, but again, not 100% faithfully.
Not much, honestly. I was stoned most of the time. I can’t blame that on Apple.
Years of creating content, most recently video content in iMovie. Home movies of my children being born and growing up, heavily edited and burned to DVD and distributed to friends and family. Plus a few screencasts and some other odd video projects I’ve never released. Years of tagging and organizing an ever-growing collection of music, photos, and multimedia. I’ve now exported all my home movies as .DV files — one for the final product, one for all the unused clips. All other edits are lost. All editability is lost. All my iTunes ratings and playlists are lost. All my iPhoto tags and ratings are lost. John has heard this part already; I can’t imagine why he thought it was an isolated occurrence.
Oh, and let’s not even talk about all the mail programs I’ve used. Eudora, Claris Emailer, Outlook, Outlook Express, Pine, Elm, and no doubt a few I’ve forgotten. When I finally came to my senses in 2001, I somehow managed to collect, convert, and salvage most of my mail and landed them all in Mail.app. I specifically chose Mail.app because I knew that it stored everything in mbox format, and that that was the oldest, most stable, safest choice for long-term preservation.
And then came Tiger, and Mail.app 2.0. In Mac OS X 10.4, Apple deliberately changed Mail.app to use their proprietary .emlx data format, apparently to work around the limitations of Spotlight. Mail.app 2.0 helpfully auto-converted all my wonderful mbox files into Apple’s shitty undocumented format. I’m now in the process of undoing the damage. I tried an emlx-to-mbox converter program, but it has bugs that ruin certain mail messages and corrupt the resulting mbox file. (Specifically, mail messages that contain a line that starts with the word “from”.) Perhaps JWZ’s emlx.pl script will fare better. JWZ knows mail. [Update: thanks to everyone who emailed me suggesting I "think different" about the problem. After an hour of wrangling configuration files and cursing Apple, I successfully set up Dovecot and migrated all my mail via IMAP. Whoops, no I didn't. Somehow it gets stuck and only exports selected messages from large folders, somewhere around 1000 messages. Can't see the pattern as to which messages get dropped. Perhaps I could split my 66,000 messages into 66 different folders and then reassemble them on the other side.] [Update 2: no thanks to macosxhints.com which suggests using Mail.app's "Save as" feature to export my messages. This feature is so broken as to be useless.]
This was really the last straw for me. I was already feeling vaguely dissatisfied with Apple; now I feel actively betrayed. By the time I even realized what had happened (a year after buying OS X 10.4), it was too late. Now I’m forced to migrate all my mail yet again from yet another proprietary format, and the best documentation I’ve found so far is on LiveJournal. Jesus H. Christ, somebody deserves to be fired for that.
There’s an important lesson in here somewhere. Long-term data preservation is like long-term backup: a series of short-term formats, punctuated by a series of migrations. But migrating between data formats is not like copying raw data from one medium to another. If I can plug both types of media into the same computer (or even the same network), I can migrate raw data from one generation to the next (I just did it with my ReadyNAS). Then there are various things you can do (checksums and so forth) to verify that the data was copied 100% correctly. But converting data into a different format is much trickier, and there’s the potential of data loss or data degradation at every turn.
Fidelity is not a binary thing. Data can gradually degrade with each conversion until you’re left with crap. People think this only affects the analog world, like copying cassette tapes for several generations. But I think digital preservation is actually much harder, in part because people don’t even realize that it has the same issues.
(Of course, sometimes fidelity is a binary thing. Why do I avoid DRM? Because the entire point of DRM is to make migration impossible, to reduce the fidelity of your conversion to 0. Apple’s iTunes DRM is actually the oddball here, since it is technically possible to migrate the songs you buy from the iTunes Music Store. Of course, you have to burn the songs onto a CD (assuming iTunes will let you) and then you can re-rip them in the format of your choice. This involves some loss of fidelity, but at least it’s technically possible. Other DRM schemes are even worse. But note that Apple’s DRM scheme has gotten worse since they first introduced it. That alone should be enough of a deterrent for people, but apparently it isn’t.)
So if you care about long-term data preservation, your #1 goal should be to reduce the number of times you convert your data from one format to another. You should also strive to increase the fidelity of each conversion, but you may not have any control over that when the time comes. Plus, you may not know in advance how faithful the conversion will be, so planning ahead to reduce the number of conversions is a better bet.
Once I realized this, I started thinking about the risk factors that would increase the number of conversions. Data readable by only one application is a big risk factor, because the application won’t be around forever. If that application only runs on one operating system, that’s even worse, because the operating system won’t be around forever either. If that operating system only runs on one hardware platform, that’s even worse still. No hardware lasts forever, and you may eventually need to resort to emulating the hardware in software. Emulation is the ultimate fallback. But if any or all of those layers are closed, emulation may be costly or even impossible. And if any of the layers are DRM-encumbered, emulating them may be illegal. Data preservation is an ogre, and ogres have layers.
In the extreme case, you can try to pick a format up front and never change it. Project Gutenberg insists on publishing their e-books as plain ASCII text, even their most recent ones. The conversion from paper introduces a number of errors, which they clean up by hand, but that’s it. They’re not ever planning to convert them to another digital format, so the data fidelity will never go down during a less-than-perfect conversion. And ASCII is old and stable and safe and upward compatible with newer encodings like UTF-8 and can be read by any program on any platform, so it’s a safe choice.
Thinking in terms of risk factors, you can begin to see why I chose to switch away from Apple and onto a Free Software platform. Mac OS X is only available from Apple, and it only runs on Apple hardware. Running a Free Software operating system removes both of these risk factors at once. Furthermore, Apple has made it very clear that they will do everything in their power to protect this lock-in. Despite the fact that their Intel-based operating system could run on commodity hardware, Apple has intentionally crippled Mac OS X with code that checks the hardware to ensure that it came from Apple. They’re intentionally introducing friction between the layers, bolting their operating system onto their hardware. Some day there will be no hardware that can run Mac OS X, and because of Apple’s DRM it will be illegal to emulate it in software.
There are more risk factors in the layer above the OS, the application layer. I still need to be vigilant about the formats that specific applications use to store data I care about preserving. Open source != open formats, and there are many examples of undocumented and underdocumented data formats in open source applications. The GIMP is a particularly egregious example. Its default .xcf format can only be read by GIMP and is deliberately undocumented outside the source code. GIMP only exports to formats with massive fidelity loss (you can export the final result but not in any editable form that includes layers and effects and brushes and so on). There are only a handful of third-party converters, and none of them are anywhere near complete. This is no better than Microsoft Office; in fact, it’s probably worse. In practice, Microsoft Office documents have better interoperability, because third parties have spent more time reverse-engineering the formats and handling all the edge cases. (Third parties are working on reverse-engineering XCF too.)
Storing my data in open formats also mitigates several risk factors, but not the same factors as running Free Software. Open formats increase my chances of finding alternate applications that can read my data in its current form, which in turn increases my chances of being able to migrate one of the other layers (OS or hardware) without being forced to convert my data to another format. Open formats also increase my chances of maintaining data fidelity during conversion, since it decreases the difficulty of developing a converter that handles all possible cases. It’s always the edge cases that come back to bite you.
I’m not claiming that either Free Software or open formats are a silver bullet. There are many risk factors, and Free Software mitigates some of them some of the time. There are many layers — data on top of applications on top of operating systems on top of hardware — and open formats can reduce the friction between some of them some of the time. They’re both lubricants that help you to slide out one layer and replace it without the whole thing toppling down. Apple would prefer that I not replace any of their layers, and they have gone out of their way to increase the friction between them.
Which brings us back to John Gruber’s oranges. His counter-argument — that lock-in hasn’t been a problem for me yet, so why all the fuss now — could not be further from the truth. It’s been a constant problem for 22 years. Much of the data I’ve spent my life creating has been lost or seriously degraded through a series of proprietary formats and forced migrations. This is why I felt so betrayed, in particular, by Mail.app “upgrading” me away from mbox format. It took a lot of forethought on my part, not to mention actual time and effort, to convert all my disparate mail archives from all those different mail programs. I finally got everything into a single archive in an open, stable format… and just 3 short years later, Apple found a way to screw me one last time. It’ll be the last time they get the chance.
§
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–9 Mark Pilgrim