Scraping guides: Parsing HTML using CSS selectors

We’ve added a new scraping copy-and-paste guide, so you can quickly get the lines of code you need to parse an HTML file using CSS selectors. Get to it from the documentation page:

The HTML parsing guide is available in Ruby, Python and PHP. Just as with all documentation, you can choose which at the top right of the page.

While the library used varies (lxml in Python, Nokogiri in Ruby, Simple HTML DOM in PHP), the principle is the same. You pull the text out of the page the way as you use CSS to style a page.

It’s a popular technique – for example, around 30% of Python scrapers on ScraperWiki use lxml.

This entry was posted in developer and tagged , , , , . Bookmark the permalink.

2 Responses to Scraping guides: Parsing HTML using CSS selectors

  1. Mortimer says:

    To do something with the data, I just added a View guide for google visualization: https://views.scraperwiki.com/run/google_simple_graph_copypaste/

  2. Pingback: xhtml css templates – Scraping guides: Parsing HTML using CSS selectors | ScraperWiki … | XHTML CSS - Style sheet and html programming tutorial and guides

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s