1. Scrapers are now completely language neutral. Not just Python and Ruby – but anything open source that can make or read an SQLite file, from R to Clojure.
2. Scrapers can have as many files as they like. So you can use modules, write separate tests… whatever you want to do.
3. You can use any open source version control system to keep your scraper’s code. And push it to Github, Google Code or wherever you like.
4. None of the outgoing ports are blocked — if you need to use SMTP or NNTP or an HTTP proxy on a funny port, go for it. Actually, come to think of it, the incoming ones aren’t either, although we don’t guarantee persistence of anything you have listening on them.
5. HTTPS just works. There’s no invisible proxy breaking all the certificate checking.
6. You can SSH into your scraper, and do anything — keep tmux sessions, edit the code with vim, debug it with the Python debugger — we don’t care! It’s just like a Unix shell account.
7. Lots of standard libraries are built in as before. But if needed, you can install any software using your language’s package manager such as
pip. You can even compile it with
./configure; make; make install.
8. ScraperWiki Classic’s datastore commands are now just standard modules in Python and Ruby. Go ahead:
pip install scraperwiki or
gem install scraperwiki. Then run your scraper on your laptop or on your own servers.
9. The “Code in your browser” tool is just another tool, like the “Search for Tweets” tool or the “Upload a spreadsheet tool”. Its source code is on Github. You can make your own tools that get data.
10. Once you’ve scraped your data, you can use any tool with it. Of course there are basic things like “View in a table”, or “Download as spreadsheet”. But there are also fancier ones like “Query with SQL” and “Summarise this data”. You can make your own tools that use data.
Sign up to try it at beta.scraperwiki.com