ScraperWiki has always made it as easy as possible to code scripts to get data from web pages. Our new platform is no exception. The new browser-based coding environment is a tool like any other.
Here are 9 things you should know about it.
1. You can use any language you like. We recommended Python, as it is easy to read, and has particularly good libraries for doing data science.
3. It’s easy to transfer a scraper from ScraperWiki Classic. Find your scraper, choose “View source”, then copy and paste the code into the new “Code in your browser” tool. You have to make sure you keep the new first line that says the language, e.g. “#!/usr/bin/env python”.
4. There are tutorials on Github, if you want to learn to scrape. It’s a wiki, please help improve them! The tutorials work just as well on your own laptop too.
5. To run the code press the “Run” button, to stop it press the “Stop” button.
6. The code carries on running in the background even if you leave the page. You can come back and see the output log, or even see a scheduled run happening mid-flow.
7. It has flexible scheduling. As well as hourly, daily and monthly, you can choose the time of day you want it to run.
8. You can SSH in, if you need to do something the tool doesn’t do. Your scraper is in “code/scraper”. You can install new libraries, add extra files, edit the crontab, access the SQLite database from the command line, use the Python debugger… Whatever you need.
9. It’s open source. You can report bugs and make improvements to the tool’s interface. Please send us pull requests! Want to know more? Try this quick start guide. Read the tool’s FAQ. Or find out 10 technical things you didn’t know about the new ScraperWiki.