One of the many cool features about Python is that you can dynamically graft new methods onto existing classes at runtime. (Objective-C can also do this; it calls them “categories”.) I do this all the time with the built-in Python XML parsing classes, which are great at parsing XML documents into objects, but which suck at actually using those objects productively. I know, I know, I could install PyXML and use XPath and so forth to navigate through the parsed XML document, but that’s a pain and usually overkill. Most XML documents are not that complex, and a few simple utility methods in the minidom classes can go a long way.
To dynamically graft a method onto an existing class, you use the new module. First, define a function (outside of any existing class) as if it were a class method — with self as the first argument and so forth. Then call new.instancemethod with that function, and the class you want it to be grafted onto. new.instancemethod returns the class method, which you can simply assign to the class under the name you want. For instance:
from xml.dom import minidom import urllib import new # add useful 'first' method to minidom.Element class def _first(self, path): """get first node, possibly recursively this is like a poor-man's XPath; the path parameter can look like this: "channel/title", which will find the first 'channel' node off self, then the first 'title' node off the channel node""" element = self for name in path.split("/"): element = element.all(name)[0] return element minidom.Element.first = new.instancemethod(_first, None, minidom.Element)
This creates a first method on the minidom.Element class which, when called, executes the code that we defined in our _first function. So element.first(path) is now the same as _first(element, path).
# duplicate 'getElementsByTagName' as 'all' because I'm lazy minidom.Element.all = minidom.Element.getElementsByTagName # add useful 'text' method to minidom.Element class def _text(self): """returns all text of a node in one string text may be split into several Text nodes; minidom likes to create separate Text nodes for carriage returns and ampersands and things""" def isTextNode(node): return isinstance(node, minidom.Text) def getData(node): return node.data try: return "".join(map(getData, filter(isTextNode, self.childNodes))) except: return "" minidom.Element.text = new.instancemethod(_text, None, minidom.Element)
Ditto here; element.text() is now the same as _text(element).
Now let’s put it to good use. This is a function that parses a well-formed RSS document into a tuple of 4 items: channel title, channel link, channel description, and a list of items, each of which is a tuple of (item title, item link, item description).
def parseRSS(url): """parse RSS into (channel-title, channel-link, channel-description, [(item-title, item-link, item-description), ...])”"” usock = urllib.urlopen(url) rss = minidom.parse(usock).firstChild usock.close() channel = rss.first(’channel/title’).text() link = rss.first(’channel/link’).text() description = rss.first(’channel/description’).text() items = [(item.first('title').text(), item.first('link').text(), item.first('description').text()) for item in rss.all('item')] return channel, link, description, items if __name__ == ‘__main__’: import sys, pprint rssurls = sys.argv[1:] or ['http://diveintomark.org/xml/rss.xml'] pprint.pprint(map(parseRSS, rssurls))
Note that there is virtually zero error handling here: all elements must exist in the source document, or the program will crash. Many actual RSS feeds do not have all of them, and the RSS specification does not require them. This is only an example.

