$newsid = ''; ?> This weekend I added a new feature in the sidebar of my music page: selections from the Austin Chronicle music listings.
I do this by the dangerous process known as screen scraping. I have a little perl script which downloads the next week's worth of Chronicle music pages and attempts to parse them for the venue/band entries, then matches against a list of bands I'm interested in, and spits out the result in HTML. I run the script in the wee hours every morning. The risk -- eventually a certainty -- is of course that the Chronicle will change the format of their pages and my script will break. In the utopia of the Semantic Web their pages would have machine-readable markup indicating "musical event", "venue", and "performer", but of course this is the real web and I'm having to decipher markup intended for computer screens, not personal entertainment 'bots.
The humbling part of the process was coming up with the list of bands to watch for. You think you're all hip and knowledgeable about the Live Music Capital of the World and then you realize you can't even identify the broad musical genre of 98% of the bands in the Chronicle every week.
(Any Chronicle web dude(ette)s reading this: Your pages say "commercial use prohibited", which exempts my little blog, and in any case I believe Fair Use would apply here. If my re-use of a small portion of your listings bums you out, please get in touch. And if you'd like to kick around ideas about how the Chronicle could provide personalized services like this to its readers, by all means let's have lunch.)