Prentiss Riddle: Toys

aprendiz de todo, maestro de nada

Prentiss Riddle
aprendizdetodo.com
riddle@io.com

 
home art austin books
causes chuckles garden
kids language movies
music time toys travel
 
Search this site

Archive by date
Archive by title
RSS/XML

Screen-scraping the Chronicle music listings

This weekend I added a new feature in the sidebar of my music page: selections from the Austin Chronicle music listings.

I do this by the dangerous process known as screen scraping. I have a little perl script which downloads the next week's worth of Chronicle music pages and attempts to parse them for the venue/band entries, then matches against a list of bands I'm interested in, and spits out the result in HTML. I run the script in the wee hours every morning. The risk -- eventually a certainty -- is of course that the Chronicle will change the format of their pages and my script will break. In the utopia of the Semantic Web their pages would have machine-readable markup indicating "musical event", "venue", and "performer", but of course this is the real web and I'm having to decipher markup intended for computer screens, not personal entertainment 'bots.

The humbling part of the process was coming up with the list of bands to watch for. You think you're all hip and knowledgeable about the Live Music Capital of the World and then you realize you can't even identify the broad musical genre of 98% of the bands in the Chronicle every week.

(Any Chronicle web dude(ette)s reading this: Your pages say "commercial use prohibited", which exempts my little blog, and in any case I believe Fair Use would apply here. If my re-use of a small portion of your listings bums you out, please get in touch. And if you'd like to kick around ideas about how the Chronicle could provide personalized services like this to its readers, by all means let's have lunch.)

toys 2003.09.15 link