There’s More Than One Way to Scrape a Site

A request came in to ScraperWiki to scrape information on the Members of the European Parliament.  I put it out on Twitter and Facebook hoping a kind member of the ScraperWiki community will have spent so much time on the computer he/she has no life at all. I had to turn people away!

Within minutes, two tweeters wanted to give it a go and I got a reply on Facebook.  In fact, Tim Green had already scraped the names and URLs of MEPs by the time I got back to him saying it had already been claimed on twitter by Pall Hilmarsson.

Although both scrapers are looking at the same site, Tim‘s is less than 20 lines of code and with only 8 revisions, it’s a very quick scrape. Whereas Pall‘s went for the full schebang, scraping opinions and speeches and generally drilling down into the data a whole lot more. Hence the nearly 200 lines of code!

So if you’re a code junky, take a look and what it takes to scrape and then scrape further by comparing scrapers/meps with scrapers/meps_2.   Also, Tim kindly scraped the next request: National Historic Ships Register. To Tim and Pall I say: If the ScraperWiki digger were capable of emotion you would both be receiving a diesel greasy kiss!

European Parliament Members and National Historic Ships – you’ve been ScraperWikied! (with help from your friendly neighbourhood programmers)

This entry was posted in Scrapers, users and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s