10 technical things you didn’t know about the new ScraperWiki

photo-21. Scrapers are now completely language neutral. Not just Python and Ruby – but anything open source that can make or read an SQLite file, from R to Clojure.

2. Scrapers can have as many files as they like. So you can use modules, write separate tests… whatever you want to do.

3. You can use any open source version control system to keep your scraper’s code. And push it to Github, Google Code or wherever you like.

4. None of the outgoing ports are blocked — if you need to use SMTP or NNTP or an HTTP proxy on a funny port, go for it. Actually, come to think of it, the incoming ones aren’t either, although we don’t guarantee persistence of anything you have listening on them.

5. HTTPS just works. There’s no invisible proxy breaking all the certificate checking.

6. You can SSH into your scraper, and do anything — keep tmux sessions, edit the code with vim, debug it with the Python debugger — we don’t care! It’s just like a Unix shell account.

7. Lots of standard libraries are built in as before. But if needed, you can install any software using your language’s package manager such as npm, or  pip. You can even compile it with ./configure; make; make install.

8.  ScraperWiki Classic’s datastore commands are now just standard modules in Python and Ruby. Go ahead:  pip install scraperwiki or   gem install scraperwiki. Then run your scraper on your laptop or on your own servers.

9. The “Code in your browser” tool is just another tool, like the “Search for Tweets” tool or the “Upload a spreadsheet tool”. Its source code is on Github. You can make your own tools that get data.

10. Once you’ve scraped your data, you can use any tool with it. Of course there are basic things like “View in a table”, or “Download as spreadsheet”. But there are also fancier ones like “Query with SQL” and “Summarise this data”. You can make your own tools that use data.

Sign up to try it at beta.scraperwiki.com

This entry was posted in beta, developer. Bookmark the permalink.

One Response to 10 technical things you didn’t know about the new ScraperWiki

  1. Pingback: 9 things you need to know about the “Code in your browser” tool | ScraperWiki Data Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s