dive into mark

You are here: dive into markArchivesSeptember 2006HOWTO parse feeds on the command line

Monday, September 11, 2006

HOWTO parse feeds on the command line

$ ./feedparser.py -\-help
usage: feedparser.py [options] url_or_filename_or_-

options:
  -\-version             show program’s version number and exit
  -h, -\-help            show this help message and exit
  -A AGENT, -\-user-agent=AGENT
                        User-Agent for HTTP URLs
  -e URL, -\-referer=URL, -\-referrer=URL
                        Referrer for HTTP URLs
  -t TAG, -\-etag=TAG    ETag/If-None-Match for HTTP URLs
  -m DATE, -\-last-modified=DATE
                        Last-modified/If-Modified-Since for HTTP URLs (any
                        supported date format)
  -f FORMAT, -\-format=FORMAT
                        output results in FORMAT (text, pprint)
  -v, -\-verbose         write debugging information to stderr

$ d=`./feedparser.py -f text http://diveintomark.org/tag/diveintomarkshow/feed/`
$ echo \"${d}\" | grep ^feed.title_detail
feed.title_detail.base=http://diveintomark.org/tag/diveintomarkshow/feed/
feed.title_detail.language=en
feed.title_detail.type=text/plain
feed.title_detail.value=dive into mark / diveintomarkshow
$ echo \"${d}\" | grep ^headers.last-modified
headers.last-modified=Sat, 09 Sep 2006 22:53:49 GMT
$ echo \"${d}\" | grep ^etag
etag=\”a922d132f3216fe5487bad4b2b5271cc\”
$ echo \"${d}\" | grep ^etag | cut -d= -f2-
\”a922d132f3216fe5487bad4b2b5271cc\”
$ echo \"${d}\" | grep ^entries | grep enclosures | grep href= | cut -d= -f2-
http://diveintomark.org/public/2006/08/20060822.mp4
http://diveintomark.org/public/2006/08/20060822.srt
http://diveintomark.org/public/2006/08/20060809.mp4
http://diveintomark.org/public/2006/07/20060727.mp4
http://diveintomark.org/public/2006/07/20060727.srt

Command-line support requires Python 2.3 or later, since it uses optparse.

In case it was too subtle, this also serves as an announcement of nightly.feedparser.org, your source of nightly builds and up-to-the-minute documentation (or lack thereof). The site is fully automated and mostly unmonitored, which means that when it fails, it will do so in a most spectacular fashion.

Filed under , , , ,

16 comments

  1. Keep going:

    $ echo ”${d}” | grep ^entries | grep enclosures | grep href= | cut -d= -f2- | xargs -n1 curl -v -O

    Comment by Ryan Tomayko — Monday, September 11, 2006 @ 2:54 am

  2. Too nerdy!

    /me hides under the table.

    Comment by nikkiana — Monday, September 11, 2006 @ 3:16 am

  3. Somehow the return value of feed.title_detail.type seems wrong…

    Comment by Anne van Kesteren — Monday, September 11, 2006 @ 6:52 am

  4. Pingback by links for 2006-09-11 « Breyten’s Dev Blog
  5. Just committed command-line etag/last-modified support and updated the post accordingly.

    @Anne: ? The serialized values match the values you get in Python in the returned data structure.

    Comment by Mark — Monday, September 11, 2006 @ 10:09 am

  6. Sorry, I confused the media type of the title’s contents with the media type of the feed. I was actually hoping I aborted the request to post that comment, but it happens much quicker than I thought!

    By the way, you seem to be missing an end tag around here.

    Comment by Anne van Kesteren — Monday, September 11, 2006 @ 11:35 am

  7. > By the way, you seem to be missing an end tag around here.

    Fixed. Damn you commenters leaving invalid HTML in my comments. (Also, AFAICT, damn Wordpress’s “smart” formatting functions that make bad problems worse.)

    Comment by Mark — Monday, September 11, 2006 @ 11:50 am

  8. Damn you commenters leaving invalid HTML in my comments.

    Haven’t we been here before …?

    Comment by Jacques Distler — Tuesday, September 12, 2006 @ 2:01 am

  9. In your find command, it’s more efficient to use -delete rather than -exec rm {} \;.

    Comment by Basil Crow — Tuesday, September 12, 2006 @ 12:57 pm

  10. PLZ Help!!

    Can anyone tellme how can I put the logo on the Wiss RSS of my site?

    Comment by Mave — Tuesday, September 12, 2006 @ 1:31 pm

  11. What find command?

    Comment by Mark — Tuesday, September 12, 2006 @ 3:07 pm

  12. $ curl -s http://nightly.feedparser.org/update-nightly.sh | grep -n find
    9:find . -name “.#*” -exec rm {} \;

    Comment by Ryan Tomayko — Tuesday, September 12, 2006 @ 3:26 pm

  13. Ah. Indeed. But I don’t appear to have that option.

    $ find . -name “.#*” -delete
    find: invalid predicate `-delete’
    $ find . -name “.#*” –delete
    find: invalid predicate `–delete’
    $ man find | grep delete
    $ find –version
    GNU find version 4.1.7

    Comment by Mark — Tuesday, September 12, 2006 @ 3:31 pm

  14. > But I don’t appear to have that option.

    Your find is old.

    $ find –help | grep delete
    -fls FILE -ok COMMAND ; -print -print0 -printf FORMAT -prune -ls -delete

    $ find –version
    GNU find version 4.2.20
    Features enabled: D_TYPE O_NOFOLLOW(enabled)

    Comment by prakash — Tuesday, September 12, 2006 @ 4:18 pm

  15. Thanks, all. Very informative. Unfortunately, the script is running on a shared host where I don’t have privileges to upgrade the find command, so I’ll have to leave it as-is for the time being.

    Comment by Mark — Tuesday, September 12, 2006 @ 4:25 pm

  16. Mark, why don’t you make a bash output format for feedparser? The text format is almost parseable by bash, except that . should be replaced by something like __, the text at the right hand side should be enclosed in single quotes, and, the most difficult, the arrays should be represented someway. But once this is done, one could do:

    $ eval $(./feedparser.py -f text http://diveintomark.org/tag/diveintomarkshow/feed/)
    $ echo $encoding
    utf-8
    $ echo $feed__title_detail__base
    http://diveintomark.org/tag/diveintomarkshow/feed/

    And so on. But I don’t know if this would be much useful, though.

    – Nilton

    Comment by Nilton Volpato — Saturday, September 23, 2006 @ 12:02 am

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



Recent Stuff For You, Special Price Stay Here
  • Greasemonkey Hacks
Good Stuff Buy The Cow Go Away
Dive Into Python
Powered by Google Drink The Milk Don't Steal

 

posts / comments
© 2001-8 Mark Pilgrim