The answer is that, yes, we do, but in an isolated environment. Your own “sandbox” if you like, where you can safely build castles without knocking others over. Or, as The Julian calls it, a “firebox” where you can burn logs without burning down the whole house.
We’re rolling out an upgrade to that environment, changing to a new core technology. We used to use a thing called UML (User Mode Linux), and now we’re changing our sandbox to use a thing called LXC (Linux Containers).
It’s just been deployed, but enabled only for beta test users. Changes are:
- Safe: The scripts now run in better isolation from each other. This is so that we can offer private scrapers securely, making sure they cannot read each other’s data and code.
- Fast: Both in the editor, and when scheduled, scrapers and views run a lot quicker. The old system used a particularly slow method to identify scrapers, making it pause for half a second each page scraped, or write to the datastore (for Unix geeks, it spawned “lsof” each time). This is now down to a fraction of the time (it just looks at a bridge network IP address).
- Robust: We don’t have any long running virtual machines any more, LXC is light enough it effectively “boots up” each time the script is run. After we’ve fixed any bugs in the daemon that manages all this, it should be fundamentally more reliable.
- Updated languages: With the migration, we’re also moving from Python 2.6.2 to Python 2.7.1, and from Ruby 1.8.7 to Ruby 1.9.2. The Ruby move is particularly significant, it should be faster and make scraping unicode easier.
- Updated libraries: We’ve updated all the 3rd party libraries in the sandbox to their most recent versions.
What next? We’ll spend about a week with beta testers, testing the new containers, for bugs, compatibility and performance. If you’d like to help test, please do get in touch. We can enable it so all scrapers and views you own will run in the new LXC environment.
After that, we will start rolling it out whether you like it or not! This will break some scrapers. Specifically, there are some minor syntax changes in Ruby 1.9, and some of the library upgrades might cause problems. We’ll be eliminating as many of these as possible in the test phase, and will make another announcement before we start rolling it out for everyone. But it is possible that you will have to fix up some of your scrapers. Lets us know if you need help fixing them and we’ll do our best to get one of our developers to help you out.
Bearing in mind, after that, everything will be faster 🙂