dive into mark

You are here: dive into markArchivesMarch 2008Universal Encoding Detector 1.0.1 is out

Wednesday, March 5, 2008

Universal Encoding Detector 1.0.1 is out

Despite a complete lack of fanfare or self-promotion, much of the Python-loving world seems to have found my Universal Encoding Detector, which is a pure-Python port of Mozilla’s encoding detection. UED is used in a variety of end-user applications and other developer libraries, including:

And probably some others I don’t know about.

This is what it feels like to be an upstream author. And I use the term “author” loosely, since all I did was port somebody else’s wicked-smart algorithm, introduce new bugs, and write a few incoherent pages of documentation. But still, it is humbling to step back and observe the enormous worldwide community that is constantly packaging, updating, integrating, and distributing this stuff.

Anyway, version 1.0.1 is out, with a whopping two bugs fixed. Sorry it’s so late, but I was busy practicing witchcraft and becoming a lesbian.

Yeah, I didn’t see that coming either.

Filed under , , ,

7 comments

  1. Wow, being a Venus user, I’ve been using this one without even knowing it. Nice!

    Comment by Scott Johnson — Wednesday, March 5, 2008 @ 4:48 pm

  2. Clarification: Planet and Venus will make use of chardet if it is already installed, but do not include or distribute it. The same is true for html5lib.

    Comment by Sam Ruby — Wednesday, March 5, 2008 @ 7:09 pm

  3. I’m coding a stupid Trac like tool for perforce, and I had problems importing some change lists (set) comments into it, but now it’s all over thanks to your chardect.

    Thanks !
    Benjamin.

    http://code.google.com/p/p4watch/source/diff?r=9&format=side&path=/trunk/django_web/p4populate.py

    Comment by Benjamin Sergeant — Thursday, March 6, 2008 @ 12:32 am

  4. I didn’t even know it was yours! Your rock.

    Comment by Zack — Thursday, March 6, 2008 @ 1:18 am

  5. Pingback by links for 2008-03-06 « Breyten’s Dev Blog
  6. How ironic that the Mozilla page on universal character encoding detection is being served with the incorrect character encoding.

    Perhaps [lazyweb]someone[lazyweb] should also port this code to Apache.

    Comment by Dotan Dimet — Friday, March 7, 2008 @ 7:10 pm

  7. #6: Fastest way to do that is to write a mod_perl handler, need not be very large, and hook it up with Encode::Detect. This is the Perl interface to Mozilla nsUniversalDetector.

    http://search.cpan.org/dist/Encode-Detect/

    Comment by Anonymous — Saturday, March 8, 2008 @ 10:37 pm

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



Recent Stuff For You, Special Price Stay Here
  • Greasemonkey Hacks
Good Stuff Buy The Cow Go Away
Dive Into Python
Powered by Google Drink The Milk Don't Steal

 

posts / comments
© 2001-8 Mark Pilgrim