Is scraping legal?

Lots of people, when they hear about ScraperWiki, ask “is scraping legal? how can you build a business off that?”. Usually to follow up by saying “we do it in our company, but we would never tell anyone”.

This is strange to us, as we have come from a world of good scraping. Taking Government data, and making it easier for people to use for things that benefit all of society. We’re in favour of that kind of scraping.

It’s obviously a spectrum. At the other extreme, the most evil scraping would be to steal content that somebody else sells, and then to republish it at harm to their business. We’re against that kind of scraping.

It’s not scraping itself which is good or bad, or legal or illegal, but the circumstances in which you’re doing it.

We’ve written up in full our policy about the legality, it’s in our FAQ under ‘What’s your policy on what’s legal to scrape?‘. Lots of details about robots.txt and take down notices, and what is our and your legal responsibility.

Finally, ScraperWiki isn’t just about scraping.

We’re a data hub, and you need to get data into a data hub. As well as scraping, lots of people make API calls to do that on ScraperWiki, or download their own files from their own servers.

This is much more profound than it sounds – when you are using data for a new purpose, even if it is already structured, you still need to get it and convert it to your new needs. How you do that is a detail that depends on the circumstances.

The difference between parsing HTML web pages, and using a JSON REST API is surprisingly small. As an example, Thomas scraped EventBrite even though it has an API (see the post at the end of that thread by Ryan who works at EventBrite!), because it was easier at the time for him.

What matters is getting the data, and converting it into a form where it can do something useful for the world. And doing that legally. Whether you’re using Nokogiri or Nestful.

This entry was posted in thoughts. Bookmark the permalink.

12 Responses to Is scraping legal?

  1. jtownend says:

    Reblogged this on Media law and ethics and commented:

    ScraperWiki is a Liverpool-based data tools service and community I did some work for in 2010/11 and a winner of the Knight News Challenge 2011. In this post, its CEO Francis Irving looks at the legal issues around screen scraping.

  2. Joel says:

    Very interesting post.
    In my opinion, if data is publicly viewable / indexed by search engines, expect it to be scraped. There are ways to prevent scraping from happening, and if one really wants scraping of data to be stopped, they should implement various methods to within their website/service.

  3. Although I understand the concern I find this question absurd considering the leading companies in technology make regular use of scraping. Google is nothing more than a very large scraping service that scours the internet for keywords. Need I say more?

  4. Pingback: As unstructured data heats up, will you need a license to webcrawl? — Cloud Computing News

  5. Pingback: As unstructured data heats up, will you need a license to webcrawl? | Apple Related

  6. Pingback: GIASTAR – Storie di ordinaria tecnologia » Blog Archive » As unstructured data heats up, will you need a license to webcrawl?

  7. Pingback: Noah Zimmerman » As unstructured data heats up, will you need a license to webcrawl?

  8. Pingback: As unstructured data heats up, will you need a license to webcrawl? |

  9. Pingback: Web Design: If I wanted to create a site that compared the price of ... lets say pencils. Could I legally scrape sites like Staples, and Walmart to get different pencil prices? - Quora

  10. Pingback: Programmier-Crashkurs für Journalisten - UNIVERSALCODE

  11. Pingback: wifi phones: PowerGen Dual Port USB 2.1A 10W AC Travel Wall Charger – White

  12. Mohit Sharma says:

    Is viewing a website/site’s html in a browser illegal? Here’s our take on legality of web crawling/scraping –

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s