Universal Feed Parser 3.3 is out. You can download it at SourceForge. That package no longer includes the more than 2700 unit tests; they are now available separately.
The major new feature in this release is improved performance, thanks to a patch from Juri Pakaste. Under Python 2.2, this version runs twice as fast as previous versions. Under Python 2.3, it runs five times as fast. No kidding. Thanks, Juri. Juri is the project lead of Straw, a desktop aggregator for Linux, which uses the Universal Feed Parser.
Other changes in this release:
- Refactored the date parsing routines, and added a new public function
registerDateHandler(). - Added support for parsing more kinds of dates, including Korean, Greek, Hungarian, and MSSQL-style dates. Thanks to ytrewq1 for numerous patches and help refactoring the date handling code.
- In the “things nobody cares about but me” department, UFP now detects feeds served over HTTP with a non-XML
Content-Typeheader (such astext/plain) and setsbozo_exceptiontoNonXMLContentType. Such feeds can never be well-formed XML; in fact, they should not be treated as XML at all. (Note that not everyone shares this view.) - Documented UFP’s relative link resolution.
- Fixed problem tracking xml:base and xml:lang when one element declares it, its child doesn’t override it, its first grandchild does override it, but then its second grandchild doesn’t.
- Use
Content-LanguageHTTP header as the default language, if noxml:langattribute,<language>element, or<dc:language>element is present. - Optimized EBCDIC to ASCII conversion.
- Added
zopeCompatibilityHack(), which makes theparse()routine return a regulardictinstead of a subclass. I have been told that this is required for Zope compatibility (hence the name). It also makes command-line debugging easier, since thepprintmodule inexplicably pretty-prints real dictionaries differently thandictsubclasses. - Support
xml:lang=""for setting the current language to “unknown.” This behavior is straight from the XML specification. Anyone who tells you that good specs don’t matter is lying, or ignorant, or trying to sell you a bad one, or… hey look, shiny objects! - Recognize RSS 1.0 feeds as
version="rss10"even when the RSS 1.0 namespace is not the default namespace. - Expose the status code on HTTP 303 redirects.
- Don’t overwrite the final status on redirects, in the case where redirecting to a URL returns a 304, or another redirect, or any non-200 status code.

