[Part of an ongoing series.]

I had lunch with my father the other day, and I explained this series as well as I could to someone who didn’t start programming when he was 11. His immediate reaction was, “Why are there so many different formats? Why can’t everybody just agree on a single format? It is political, or technical, or both?” The short answer is, it’s both. The history of video in any medium — and especially since the explosion of amateur digital video — has been marred by a string of companies who wanted to use container formats and video codecs as tools to lock content producers and content consumers into their little fiefdoms. Own the format, own the future. And when I say “history” — well, it’s still going on. Tried to play a Windows Media Video on Mac OS X lately? The codec and container support is out there, but it’s not baked in. Want to watch movie trailers on Apple.com? Please install QuickTime. And so forth and so on. The only thing that was pre-installed on both platforms was Flash, so when a few startups dipped their toes into the Internet video waters, the ones that used Flash Video won despite it being an objectively inferior codec. (Some revision of Flash 9 added support for H.264 video, AAC audio, and the MP4 container, which is what YouTube HD uses.)

So that’s the politics. But there are also technical barriers. As with all engineering, video encoding is primarily about constraints. I can think of 10 just off the top of my head:

  1. CPU capacity for decoding and playing in real time. This is one of the most important constraints, since video is meant to be watched in real time. That sounds simple, but it’s incredibly complex. Every video you’ve ever watched in your entire life had to be decoded and played in real time. Otherwise it stutters and the viewing experience sucks. And we’re talking about video here; if the viewing experience sucks, there’s nothing left. Some codecs are just more complex than others, and that translates into higher system requirements to decode videos in real time. As I’ve mentioned before, some codecs are now decoded by specialized hardware. iPhones have a little chip inside them that understands H.264 Baseline Profile; without that, the iPhone would need a Core 2 Duo processor to play movies, and it would have a battery life of 10 minutes.
  2. Codec compatibility. Normal people won’t download codecs or plug-ins just to watch a dog on a skateboard, or even to watch a trailer for a $100 million blockbuster. (Sadly, they will download plug-ins for porn, but those are invariably trojan horses. Or so I’ve read. Moving on…) The phone in your pocket can probably play AMR ringtones, maybe MP3 ringtones, but probably not Vorbis ringtones (unless you have an Android phone) — and you probably couldn’t download new codecs even if you wanted to (which, I must reiterate, nobody wants to). Apple and Real Networks tried for years to corner the web video market, but 99% of schmucks with a browser have Flash, so Flash video won on the web. Meanwhile, Firefox 3.1 will ship with support for the <video> element but will only support Theora and Vorbis in an Ogg container — even if your underlying operating system ships with other codecs.
  3. CPU capacity for encoding. Encoding takes a long time. Taking my home movie from iMovie to a DVD used to take 8 hours on a Powerbook G4 laptop. These days you can rip a DVD movie with Xvid in 30 minutes, or you can rip it with a more complex codec with all optional features turned on, and maybe it’ll still take 8 hours. It’ll look better, but will it look 16 times better? If you’re only doing it once, maybe you don’t care. If you’re running YouTube and people are uploading 13 hours of video every minute, maybe you do. CPU cycles aren’t free; at that scale, they’re not even cheap. (That’s a real statistic, by the way; I got it from the page on the Google intranet entitled “What can we tell non-Googlers?” and it’s accurate as of September 2008.)
  4. Acceptable delay between recording and delivery. In my own experience, videos I’ve uploaded on YouTube are available within minutes, which is just mind-boggling when you consider the volume. If you’re re-encoding a live stream, even a few minutes delay is probably unacceptable. That means you’ll need a faster encoder, a less complex codec, or lower quality settings.
  5. Audience size. It’s not a big secret that lots of video on the Internet looks like crap. Partly that’s because the video uploader uploaded crappy video, but it’s also because most Internet videos are only watched by a few people, and it’s just not a worthwhile tradeoff to spend 8 hours re-encoding it. On the other hand, if you’re mastering a DVD that’ll get sold to 10 million people, you’ll probably use higher quality settings.
  6. Screen dimensions. DVDs can’t store high-def 1920 x 1080 video because the standard doesn’t allow for it, which makes perfect sense because it was designed around the screen resolution of standard-def TVs. Blu-Ray ups the limit, but there’s still a limit. Screen sizes vary more for PC video, but there will always be practical upper limits depending on your audience.
  7. My bandwidth. If you’re streaming or downloading video, some percentage of your audience is probably living in a third-world country like the United States, with limited broadband access, slow speeds, and monthly bandwidth caps. Larger file size = longer wait to play = fewer videos watched overall.
  8. Your bandwidth. Obviously every bit I download is a bit that you upload, and bandwidth ain’t free either. “When I get a little money I buy bandwidth; and if any is left I buy food and clothes.” Or something like that.
  9. Hard limits on storage size. As I mentioned before, physical media has upper limits on total size. Commercial DVDs can hold upwards of 9 GB, which seems like a lot but really isn’t. Blu-Ray maxes out at 50 GB, which seems like a lot but really isn’t.
  10. Patents / licensing costs. Did I mention that most popular video codecs are patent-encumbered? This is why Wikimedia uses Theora exclusively, and why Firefox can ship a native Theora decoder and but won’t ever ship H.264.

…and that’s the short list.

All of which leads me to the Zen of video encoding, which is this:

There is no right or wrong. There is only what works and what doesn’t.

If you can find even one combination of tools, delivery devices, and target platforms that satisfies your constraints and still accomplishes your goals, congratulations. You’re ahead of 99% of the people who’ve tried.

Tomorrow: specifics on common devices/platforms and their limitations.

§

Six comments here (latest comments)

  1. Points two and ten combine to form one of the impressive ironies of media encoding–free codecs are unlikely to ever come preinstalled on a proprietary system, even if it would cost the vendor nothing. (I mused about this elsewhere.) Why would Microsoft play FLACs seamlessly on Vista, when it can nudge everyone toward WMA Lossless? It’s understandable, but it’s horribly frustrating, and it’s a persistent cause of codec headaches that has absolutely no legitimate (technical) cause. (The Xiph.org folks have been putting more effort into their quicktime and dshow plugins, seeing that Windows and Mac compatibility are key, but it doesn’t address the root issue.

    Similarly, Linux “is broken!” because it can’t play MP3s out of the box–this isn’t any distributor’s fault, but the people in a position to fix it aren’t likely to do so, because doing so would conflict with their business model without providing any real gain to them. It’s a long-term, intractable problem, and we’ll be stuck with it unless some free codec gains as much mindshare as MP3 had circa 1999, forcing proprietary vendors to include support.

    We got to that point with still images with JPEG, but it’s unlikely we’ll an analogous standard in video or audio any time soon.

    Or, more succinctly: Free codecs are absent from proprietary systems because the vendors don’t want to include them. Proprietary codecs are absent from free systems because the vendors can’t include them.

    — grendelkhan #

  2. One thing I do not like Apple and Sony is they alway use their own closed format. (Try to maintain the margin?)

    Apple is lucky to have Steve Jobs.

    — Mike #

  3. Nice series.

    By the way, if you are desperate to encode something with a file wart on the end named “.mp3″, look at the twolame package, which provides mpeg layer 2 audio encoding. You’ll bloat your encoding by about 10-20%, but the license for this codec is not nearly as bad. If it’s free enough to show up in debian’s massive archive, it may be worth a look.

    http://packages.debian.org/stable/sound/twolame

    — Avery #

  4. I forgot to add – many mp3 players will ingest the .mp2 file without batting an eyelid. Useful for encoding something for family or friends that can’t, or don’t, understand the complexity of encoding patent issues coupled with free software.

    “Here ya go Dad, enjoy this on your mp3 player.” (hands over mpeg layer 2 file)

    — Avery #

  5. “What can we tell non-Googlers?” That would be a neat page to publish to the outside world. (Assuming it doesn’t also have comments along the lines of “You shouldn’t mention $x, $y, or $z.”)

    — David #

  6. Actually, most of it is boring and vague, which is why I was so pleasantly surprised that the Youtube section was so interesting and specific.

    — Mark #

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



§

firehosecodeplanet

© 2001–9 Mark Pilgrim