This weekend saw ideas made reality, collaborations fostered and the future web bloom. The Mozilla Festival was all about making the web and making it happen in two days! Here at ScraperWiki we like doing that with data, so as well as contributing to the Data Driven Journalism Handbook, we held a quick fire ScraperWiki round.
And when I say quick I mean ~1hr! With a couple of geeks in hand, some eager journalist types, laptops and our ever articulate CEO, Francis Irving, we set to work, well, talking about data. The fact is there are many pre-scraping steps to consider:
- What is the general area you are interested in?
- Can you find other people, especially geeks, with that interest?
- When you have done so, you need to find where the data is that relates to your field of interest
- Once you’ve got a list of interesting data, you need to look at its structure (non-programmatically) in order to decide on a hypothesis to test
- Then you need to recruit your geek (who should be involved in all of the above steps) to start deconstructing the data i.e. seeing what can be scraped
- At this point you all need to work together to decide the schema of the scraper datastore i.e. the headings and their attributes
- Iterate until your data can answer your hypothesis or alter your hypothesis (it could be that you can mash the scraper with another dataset)
- Get working on answering your hypothesis. The outcome could be a query, a visualization or an application
- Go back to your data and iterate again so that the structure fits your outcome
- Pat yourselves on the back, have a beer and keep in touch for your next project
This may seem a bit much but this is how you make, iterate, and mediate for the web. The Mozilla Festival proved that this is achievable and enjoyable. In that vein, we got a scraper in 1hr! So a big cheer to Alex Poderoso for winning the coveted ScraperWiki mug.
To catch up on the MozFest fun, here is the first draft of the Data Journalism Handbook. The festival premiered an amazing HTML5 documentary called The One Millionth Tower. You can catch up with all the rest including teaching kids to code with Hackasaurus and hacking video with popcorn.js (and an octocopter!) and loads more at the Mozilla Festival website.