blogrollfinder.py has been updated again; there are two major changes.
First, it now looks for the published subscription file for Radio-controlled weblogs, and uses this as the blogroll if found. (For example, here are Sam Ruby’s subscriptions.) If not found, it falls back to the old logic of scraping the HTML finding lists of links separated only by tags and whitespace.
Second, I added a findNewBlogsByGoogleRelated function, which takes your URL and uses the Google API to find sites related to yours, then checks each of those sites’ blogrolls, aggregates the results, and diffs them against your blogroll. This answers the question, What people are the people related to you reading that you’re not reading?
Here’s my list:
This would make an interesting web service. The table above was output verbatim by the script, except for one particularly long URL which I hand-edited later. The whole process is not horrendously slow (15 seconds max — all the HTML requests run in parallel threads and timeout rapidly, so the script never spends too much time waiting for a single slow site), but then again, the results should only change slowly over time, so they could be computed once and then cached. Unfortunately, it takes 3 queries to Google to compile the related list, and I have a limit of 1000 queries a day. (And, due to early runaway bugs, I’m almost tapped for today.) Maybe tomorrow.
Update: I wrote the web service, so you can try it yourself and find your own neighborhood.

