New Ruby scraping tutorials – PDFs and Mechanize

Mark Chapman has made us two new Ruby tutorials.

Advanced Scraping: Pages Behind Forms shows you how to get data that is buried behind search boxes and drop down query lists. It uses the Mechanize library, which is a class that pretends to be a web browser, so it can work with cookies, and has a familiar interface

Advanced Scraping: PDFs shows you how to extract information from Adobe Portable Document Files. It uses the Ruby library PDF::Reader. It handles the text extract phase – working out how to parse that is a later skill.

You can find all the Ruby tutorials (and links to Python and PHP ones) on one page.

Thanks Mark!

This entry was posted in developer. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s