We’ve had lots of requests recently for new 3rd party libraries to be accessible from within ScraperWiki. For those of you who don’t know, yes, we take requests for installing libraries! Just send us word on the feedback form and we’ll be happy to install.
Also, let us know why you want them as it’s great to know what you guys are up to. Ross Jones has been busily adding them (he is powered by beer if ever you see him and want to return the favour).
Find them listed in the “3rd party libraries” section of the documentation.
In Python, we’ve added:
- csvkit, a bunch of tools for handling CSV files made by Christopher Groskopf at the Chicago Tribune. Christopher is now lead developer on PANDA, a Knight News Challenge winner who are making a newsroom data hub
- requests, a lovely new layer over Python’s HTTP libraries made by Kenneth Reitz. Makes it easier to get and post.
- Scrapemark is a way of extracting data from HTML using reverse templates. You give it the HTML and the template, and it pulls out the values.
- pipe2py was requested by Tony Hirst, and can be used to migrate from Yahoo Pipes to ScraperWiki.
- PyTidyLib, to access the old classic C library that cleans up HTML files.
- SciPy is at the analysis end, and builds on NumPy giving code for statistics, Fourier transforms, image processing and lots more.
- matplotlib, can almost magically make PNG charts. See this example that The Julian knocked up, with the boilerplate to run it from a ScraperWiki view.
- Google Data (gdata) for calling various Google APIs to get data.
- Twill is its own browser automation scripting language and a layer on top of Mechanize.
In Ruby, we’ve added:
- tmail for parsing emails.
- typhoeus, a wrapper round curl with an easier syntax, and that lets you do parallel HTTP requests.
- Google Data (gdata) for calling various Google APIs.
In PHP, we’ve added:
- GeoIP for turning IP addresses into countries and cities.