A faster, safer sandbox to play in

When programmers first hear about ScraperWiki, their initial reaction is often “what! you let anyone edit general purpose code and run it on your servers!”.

The answer is that, yes, we do, but in an isolated environment. Your own “sandbox” if you like, where you can safely build castles without knocking others over. Or, as The Julian calls it, a “firebox” where you can burn logs without burning down the whole house.

We’re rolling out an upgrade to that environment, changing to a new core technology. We used to use a thing called UML (User Mode Linux), and now we’re changing our sandbox to use a thing called LXC (Linux Containers).

It’s just been deployed, but enabled only for beta test users. Changes are:

  • Safe: The scripts now run in better isolation from each other. This is so that we can offer private scrapers securely, making sure they cannot read each other’s data and code.
  • Fast: Both in the editor, and when scheduled, scrapers and views run a lot quicker. The old system used a particularly slow method to identify scrapers, making it pause for half a second each page scraped, or write to the datastore (for Unix geeks, it spawned “lsof” each time). This is now down to a fraction of the time (it just looks at a bridge network IP address).
  • Robust: We don’t have any long running virtual machines any more, LXC is light enough it effectively “boots up” each time the script is run. After we’ve fixed any bugs in the daemon that manages all this, it should be fundamentally more reliable.
  • Updated languages: With the migration, we’re also moving from Python 2.6.2 to Python 2.7.1, and from Ruby 1.8.7 to Ruby 1.9.2. The Ruby move is particularly significant, it should be faster and make scraping unicode easier.
  • Updated libraries: We’ve updated all the 3rd party libraries in the sandbox to their most recent versions.

What next? We’ll spend about a week with beta testers, testing the new containers, for bugs, compatibility and performance. If you’d like to help test, please do get in touch. We can enable it so all scrapers and views you own will run in the new LXC environment.

After that, we will start rolling it out whether you like it or not! This will break some scrapers. Specifically, there are some minor syntax changes in Ruby 1.9, and some of the library upgrades might cause problems. We’ll be eliminating as many of these as possible in the test phase, and will make another announcement before we start rolling it out for everyone. But it is possible that you will have to fix up some of your scrapers. Lets us know if you need help fixing them and we’ll do our best to get one of our developers to help you out.

Bearing in mind, after that, everything will be faster 🙂

This entry was posted in developer and tagged , , . Bookmark the permalink.

2 Responses to A faster, safer sandbox to play in

  1. Pingback: New backend now fully rolled out | ScraperWiki Data Blog

  2. cypecrypeva says:

    Roberson And Emory Sports Medicine Misuse Of Drugs Act buy Ambien online Before you take this drug, it is very important that you tell your doctor if you suffer or have suffered from liver disease or respiratory problems such as emphysema, bronchitis, or asthma. http://www.baronsorchids.com/ – buy ambien without prescription You can purchase cheap Ambien(Zolpidem) without prescription. Ambien comes in two forms: Ambien and Ambien CR.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s