Despite a complete lack of fanfare or self-promotion, much of the Python-loving world seems to have found my Universal Encoding Detector, which is a pure-Python port of Mozilla’s encoding detection. UED is used in a variety of end-user applications and other developer libraries, including:
And probably some others I don’t know about.
This is what it feels like to be an upstream author. And I use the term “author” loosely, since all I did was port somebody else’s wicked-smart algorithm, introduce new bugs, and write a few incoherent pages of documentation. But still, it is humbling to step back and observe the enormous worldwide community that is constantly packaging, updating, integrating, and distributing this stuff.
Anyway, version 1.0.1 is out, with a whopping two bugs fixed. Sorry it’s so late, but I was busy practicing witchcraft and becoming a lesbian.
Yeah, I didn’t see that coming either.
§
Wow, being a Venus user, I’ve been using this one without even knowing it. Nice!
Clarification: Planet and Venus will make use of chardet if it is already installed, but do not include or distribute it. The same is true for html5lib.
— Sam Ruby ![]()
I’m coding a stupid Trac like tool for perforce, and I had problems importing some change lists (set) comments into it, but now it’s all over thanks to your chardect.
Thanks !
Benjamin.
http://code.google.com/p/p4watch/source/diff?r=9&format=side&path=/trunk/django_web/p4populate.py
I didn’t even know it was yours! Your rock.
How ironic that the Mozilla page on universal character encoding detection is being served with the incorrect character encoding.
Perhaps [lazyweb]someone[lazyweb] should also port this code to Apache.
#6: Fastest way to do that is to write a mod_perl handler, need not be very large, and hook it up with Encode::Detect. This is the Perl interface to Mozilla nsUniversalDetector.
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–9 Mark Pilgrim