At our New York datacamp, we set out to liberate data, teach people to liberate data, and find stories in data.
About 100 people showed up for the event, and about 40 of them attended the Learn to Scrape sessions.
Mike Caprio and team cleaned a spreadsheet of 80,000 records from the New York lobbyist website to power a site on New York lobbyists based on the Chicago Lobbyists site It appears that $120 million was spend on New York on lobbiests in 2011.
Michael Keller, Marc Georges et al. related the NYPD stop, question and frisk data nine mosques referenced in an NYPD report on surveillance in order to see whether there had been unusual changes in stopping activity around these mosques.
The dataset is insanely messy, but they fortunately had access to a relatively clean version that Data Without Borders had
developed in November.
I helped one team relate contracts from Open Book New York to data that they had scraped by hand (!) from hand-written forms in order to identify pontential conflicts of interest.
I helped another team identify potential stories (outliers) in the NYC Open Data graffiti locations dataset.
Susan McGregor was awarded Honorary ScraperWikian. We haven’t decided what that means yet. 🙂
Teaching the Learn to Scrape sessions and working with many of the project teams, I got the impression that we had opened participants to thinking more about how data can be scraped, transformed and analyzed to identify unusual subsets and potential stories.
Our Learn to Scrape sessions seemed to work as well; I found several participants who had claimed no knowledge of webscraping prior to the sessions to be creating reasonably complex scrapers by the next afternoon.