dive into mark

You are here: dive into markArchivesJanuary 2003Dynamically extending APIs

Monday, January 27, 2003

Dynamically extending APIs

One of the many cool features about Python is that you can dynamically graft new methods onto existing classes at runtime. (Objective-C can also do this; it calls them “categories”.) I do this all the time with the built-in Python XML parsing classes, which are great at parsing XML documents into objects, but which suck at actually using those objects productively. I know, I know, I could install PyXML and use XPath and so forth to navigate through the parsed XML document, but that’s a pain and usually overkill. Most XML documents are not that complex, and a few simple utility methods in the minidom classes can go a long way.

To dynamically graft a method onto an existing class, you use the new module. First, define a function (outside of any existing class) as if it were a class method — with self as the first argument and so forth. Then call new.instancemethod with that function, and the class you want it to be grafted onto. new.instancemethod returns the class method, which you can simply assign to the class under the name you want. For instance:

from xml.dom import minidom
import urllib
import new

# add useful 'first' method to minidom.Element class
def _first(self, path):
    """get first node, possibly recursively

    this is like a poor-man's XPath; the path parameter can
    look like this: "channel/title", which will find the first
    'channel' node off self, then the first 'title' node off
    the channel node"""
    element = self
    for name in path.split("/"):
        element = element.all(name)[0]
    return element
minidom.Element.first = new.instancemethod(_first, None, minidom.Element)

This creates a first method on the minidom.Element class which, when called, executes the code that we defined in our _first function. So element.first(path) is now the same as _first(element, path).

# duplicate 'getElementsByTagName' as 'all' because I'm lazy
minidom.Element.all = minidom.Element.getElementsByTagName

# add useful 'text' method to minidom.Element class
def _text(self):
    """returns all text of a node in one string

    text may be split into several Text nodes; minidom likes to create
    separate Text nodes for carriage returns and ampersands and things"""
    def isTextNode(node):
        return isinstance(node, minidom.Text)
    def getData(node):
        return node.data
    try:
        return "".join(map(getData, filter(isTextNode, self.childNodes)))
    except:
        return ""
minidom.Element.text = new.instancemethod(_text, None, minidom.Element)

Ditto here; element.text() is now the same as _text(element).

Now let’s put it to good use. This is a function that parses a well-formed RSS document into a tuple of 4 items: channel title, channel link, channel description, and a list of items, each of which is a tuple of (item title, item link, item description).

def parseRSS(url):
    """parse RSS into (channel-title, channel-link, channel-description,
          [(item-title, item-link, item-description), ...])”"”
    usock = urllib.urlopen(url)
    rss = minidom.parse(usock).firstChild
    usock.close()
    channel = rss.first(’channel/title’).text()
    link = rss.first(’channel/link’).text()
    description = rss.first(’channel/description’).text()
    items = [(item.first('title').text(),
              item.first('link').text(),
              item.first('description').text())
             for item in rss.all('item')]
    return channel, link, description, items

if __name__ == ‘__main__’:
    import sys, pprint
    rssurls = sys.argv[1:] or ['http://diveintomark.org/xml/rss.xml']
    pprint.pprint(map(parseRSS, rssurls))

Note that there is virtually zero error handling here: all elements must exist in the source document, or the program will crash. Many actual RSS feeds do not have all of them, and the RSS specification does not require them. This is only an example.

Filed under ,

1 comment

  1. Trackback by PyZine RE: Bryan

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



Recent Stuff For You, Special Price Stay Here
  • Greasemonkey Hacks
Good Stuff Buy The Cow Go Away
Dive Into Python
Powered by Google Drink The Milk Don't Steal

 

posts / comments
© 2001-8 Mark Pilgrim