[These notes will eventually become part of a tech talk on video encoding. List of all articles in this series.]
The most important consideration in video encoding is choosing a video codec. A future article will talk about how to pick the one that’s right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)
When you talk about “watching a video,” you’re probably talking about a combination of one video stream, one audio stream, and possibly some subtitles or captions. But you probably don’t have two different files; you just have “the video.” Maybe it’s an AVI file, or an MP4 file. These are just container formats, like a ZIP file that contains multiple kinds of files within it. The container format defines how to store the video and audio streams in a single file (and subtitles too, if any).
When you “watch a video,” your video player is doing several things at once:
A video codec is an algorithm by which a video stream is encoded, i.e. it specifies how to do #2 above. Your video player decodes the video stream according to the video codec, then displays a series of images, or “frames,” on the screen. Most modern video codecs use all sorts of tricks to minimize the amount of information required to display one frame after the next. For example, instead of storing each individual frame (like a screenshot), they will only store the differences between frames. Most videos don’t actually change all that much from one frame to the next, so this allows for high compression rates, which results in smaller file sizes. (There are many, many other complicated tricks too, which I’ll dive into in a future article.)
There are lossy and lossless video codecs; today’s article will only deal with lossy codecs. A lossy video codec means that information is being irretrievably lost during encoding. Like copying an audio cassette tape, you’re losing information about the source video, and degrading the quality, every time you encode. Instead of the “hiss” of an audio cassette, a re-re-re-encoded video may look blocky, especially during scenes with a lot of motion. (Actually, this can happen even if you encode straight from the original source, if you choose a poor video codec or pass it the wrong set of parameters.) On the bright side, lossy video codecs can offer amazing compression rates, and many offer ways to “cheat” and smooth over that blockiness during playback, to make the loss less noticeable to the human eye.
There are tons of video codecs. Today I’ll discuss five modern lossy video codecs: MPEG-4 ASP, H.264, VC-1, Theora, and Dirac.
a.k.a. “MPEG-4 Advanced Simple Profile.” MPEG-4 ASP was developed by the MPEG group and standardized in 2001. You may have heard of DivX, Xvid, or 3ivx; these are all competing implementations of the MPEG-4 ASP standard. Xvid is open source; DivX and 3ivx are closed source. The company behind DivX has had some mainstream success in branding “DivX” as synonymous with “MPEG-4 ASP.” For example, this “DivX-certified” DVD player can actually play most MPEG-4 ASP videos in an AVI container, even if they were created with a competing encoder. (To confuse things even further, the company behind DivX has now created their own container format.)
MPEG-4 ASP is patent-encumbered; licensing is brokered through the MPEG LA group. MPEG-4 ASP video can be embedded in most popular container formats, including AVI, MP4, and MKV.
a.k.a. “MPEG-4 part 10,” a.k.a. “MPEG-4 AVC,” a.k.a. “MPEG-4 Advanced Video Coding.” H.264 was also developed by the MPEG group and standardized in 2003. It aims to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth, high-CPU devices (modern desktop computers); and everything in between. To accomplish this, the H.264 standard is split into “profiles,” which each define a set of optional features that trade complexity for file size. Higher profiles use more optional features, offer better visual quality at smaller file sizes, take longer to encode, and require more CPU power to decode in real-time.
To give you a rough idea of the range of profiles, Apple’s iPhone supports Baseline profile, the AppleTV set-top box supports Baseline and Main profiles, and Adobe Flash on a desktop PC supports Baseline, Main, and High profiles. YouTube (owned by Google, my employer) now uses H.264 to encode high-definition videos, playable through Adobe Flash; Youtube also provides H.264-encoded video to mobile devices, including Apple’s iPhone and phones running Google’s Android mobile operating system. Also, H.264 is one of the video codecs mandated by the Blu-Ray specification; Blu-Ray discs that use it generally use the High profile.
Most non-PC devices that play H.264 video (including iPhones and standalone Blu-Ray players) actually do the decoding on a dedicated chip, since their main CPUs are nowhere near powerful enough to decode the video in real-time. Recent high-end desktop graphics cards also support decoding H.264 in hardware. There are a number of competing H.264 encoders, including the open source x264 library. The H.264 standard is patent-encumbered; licensing is brokered through the MPEG LA group. H.264 video can be embedded in most popular container formats, including MP4 (used primarily by Apple’s iTunes Store) and MKV (used primarily by video pirates).
VC-1 evolved from Microsoft’s WMV9 codec and was standardized in 2006. It is primarily used and promoted by Microsoft for high-definition video, although, like H.264, it has a range of profiles to trade complexity for file size. Also like H.264, it is mandated by the Blu-Ray specification, and all Blu-Ray players are required to be able to decode it. The VC-1 codec is patent-encumbered, with licensing brokered through the MPEG LA group.
Wikipedia has a brief technical comparison of VC-1 and H.264; Microsoft has their own comparison; Multimedia.cx has a pretty Venn diagram outlining the similarities and differences. Multimedia.cx also discusses the technical features of VC-1. I also found this history of VC-1 and H.264 to be interesting (as well as this rebuttal).
VC-1 is designed to be container-independent, although it is most often embedded in an ASF container. An open source decoder for VC-1 video was a 2006 Google Summer of Code project, and the resulting code was added to the multi-faceted ffmpeg library.
Theora evolved from the VP3 codec and has subsequently been developed by the Xiph.org Foundation. Theora is a royalty-free codec and is not encumbered by any known patents other than the original VP3 patents, which have been irrevocably licensed royalty-free. Although the standard has been “frozen” since 2004, the Theora project (which includes an open source reference encoder and decoder) only hit 1.0 in November 2008.
Theora video can be embedded in any container format, although it is most often seen in an Ogg container. All major Linux distributions support Theora out-of-the-box, and Mozilla Firefox 3.1 will include native support for Theora video in an Ogg container. And by “native”, I mean “available on all platforms without platform-specific plugins.” You can also play Theora video on Windows or on Mac OS X after installing Xiph.org’s open source decoder software.
The reference encoder included in Theora 1.0 is widely criticized for being slow and poor quality, but Theora 1.1 will include a new encoder that takes better advantage of Theora’s features, while staying backward-compatible with current decoders. (Info: 1, 2, 3, 4, 5, source code.)
Dirac was developed by the BBC to provide a royalty-free alternative to H.264 and VC-1 that the BBC could use to stream high-definition television content in Great Britain. Like H.264, Dirac aims to provide a single codec for the full spectrum of very low- and very high-bandwidth streaming. Dirac is not encumbered by any known patents, and there are two open source implementations, dirac-research (the BBC’s reference implementation) and Schroedinger (optimized for speed).
The Dirac standard was only finalized in 2008, so there is very little mainstream use yet, although the BBC did use it internally during the 2008 Olympics. Dirac-encoded video tracks can be embedded in several popular container formats, including MP4, Ogg, MKV, and AVI. VLC 0.9.2 (released in September 2008) can play Dirac-encoded video within an Ogg or MP4 container.
Of course, this is only scratching the surface of all the available video codecs. Video encoding goes way back, but my focus in this series is on the present and near-future, not the past. If you like, you can read about MPEG-2 (used in DVDs), MPEG-1 (used in Video CDs), older versions of Microsoft’s WMV family, Sorenson, Indeo, and Cinepak.
Tomorrow: audio codecs!
§
Good rundown! Can’t wait to see the technical rundown. “B-Frames for the Common Man”.
(Small note: Pixlet is a lossless codec.)
— dc ![]()
Could you make all the occurrences of “patent-encumbered” and “not encumbered” bold, or otherwise change the bullet points to look different? :)
@dc: thanks, I removed the reference to Pixlet.
— Mark ![]()
@anonymous: done. Also added some paragraph breaks to alleviate the WALLOFTEXT effect.
— Mark ![]()
Great run-down. You may also want to mention that Xvid is a Free Software implementation, and that Xvid and x264 are the preferred encoders amongst “pirates” for their respective formats.
So if any codec should be specified in HTML5, it could be only be Dirac ? (since Theora already have been refused, if I remember correctly)
— Rik ![]()
The companies that opposed Theora would no doubt oppose Dirac for the same reasons… submarine patent risk, strategic interest in a competing standard, and so on.
— Mark ![]()
@Dave: I reorganized that paragraph a bit and added a link to xvid.org.
— Mark ![]()
Are you sure Pixlet is lossless? If I look at some test data then it seems very much like it is lossy.
Good write up on Theora. You hit all the benefits. One suggestion, in comparing Theora to Dirac, one needs to look at the encoding techniques. Dirac was created to take advantage of wavelets, where Theora does not encode with this technique. What you will find is Theora better for some video and Dirac better for other video, because of the two different techniques.
As for the criticism you mention, I have found that people used Theora back in the days it was in alpha-quality and never checked it again. Also, they may be using software that is using old libraries, too. I have been using Totem (default GNOME media player) and Kaffeine (default KDE media player) which both use up to date Theora implementations. The quality of Theora decoding is right on par with other formats of the same file size. I provided a link to lots of online Theora videos above, for your review. When factoring in the lack of license fees, lack of copy restrictions, and the fast-paced community development efforts, Theora is a very compelling choice.
One last thing, not only Firefox 3.1 will support the HTML5 video tags, but also the next versions of Opera and Safari.
— Matthew ![]()
@ David : Xvid and X.264 cannot be free software, at least in most countries, because the MPEG software patents make the software not free to distribute.
Read up on those Mpeg-4 license fees that will begin in 2010 !
“Recent high-end desktop graphics cards” should be “Recent desktop graphics cards”; see eg HD Video Playback With A $20 CPU & $30 GPU On Linux and Can a sub-$100 graphics card get the job done?.
DONUT STOP!
One thing that might be useful as part of this series is a note on what “patent-encumbered” actually means. Although…that’s shading perilously close to legal advice, I admit.
@Matthew: I assume Opera will support Theora natively (they have an experimental build that does so already), but will Safari? That’s news to me.
— Mark ![]()
@Matthew: in a future article, I’ll try to explain the technical differences in a reasonable fashion.
— Mark ![]()
Pixlet is lossy but like Dirac Pro it only uses infra-frame compression so they are really in a different category. Editing codecs maybe?
Safari is slated, as far as I’m aware, to support the HTML5 video tag but not to support baked-in Theora, which means that in Safari will presumably play those formats that Quicktime supports on your machine.
It might be worth mentioning that MPEG-4 Simple Profile (without “Advanced”) is the dominant video codec on mobile phones previous to the iPhone/Android generation (i.e. S60). It’s also the non-H.264 “MPEG-4” codec supported by QuickTime since QuickTime 6.
Regarding Safari: Safari shipped with HTML5 video element support some time ago already. XiphQT is needed for Ogg/Theora support.
Thanks Henri, that was my understanding as well (re: Safari). As for which containers, codecs, and profiles are supported by which devices and platforms — that’s a whole other article unto itself. Possibly two.
— Mark ![]()
I don’t want to use propieraty codecs. I have severals computers, I deleted vista and only use IMAC for learning about the OS. In my opinion companies can take their propietary codecs with them, I’m not going to use it.
That’s all folks.
I think you should talk more about the MPEG-2 video codec. As old as it may be, it’s still extremely relevant for the future given its use in digital TV.
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–9 Mark Pilgrim