Data Science London 12th June – a speaker speaks

DS_LDN12JuneData Science London run an approximately monthly programme of evening events comprising short talks, beer and pizza. Last week I was invited to give a talk on Scraping and Parsing PDF using Python.

The venue for these events is the Westminster Hub in central London – we were diverted in our approach by the premier for Man of Steel in Leicester Square.

The audience was large, friendly and very diverse. Most, if not all of the audience, were highly technical. There were men in suits and ties, people with piercings, t-shirts and shorts. There were academics, web developers, economists, political science students.

There were four speakers on the evening:

  1. Rosaria Silipo from Knime presented on using their platform to process social discourse data from the Slashdot; analysing it for sentiment and for user roles in the community, the content from Slashdot acted as a substitute for content from a telecoms forum which their commercial clients were interested in.
  2. IanprofileI spoke on scraping and parsing PDF files, giving some details of the Python libraries we commonly use and illustrating with examples from my Royal Society membership list parsing and the verbatim records of the UN General Assembly and Security Council. I’ll write about this second project another time. The audience were very responsive (they laughed at my jokes) and there were some good questions at the end.
  3. Third up, after a brief pause for me to fetch a beer and wind down, was Doug Cutting – inventor of Hadoop who now works for Cloudera, he spoke about adding search capabilities to Lucene and Hadoop. I suspect he may have been the reason for the packed house.
  4. Finally Ian Oszwald from Mor Consulting spoke about brand name disambiguation for twitter i.e. knowing when someone is talking about Apple the brand or apple the fruit. There are tools for this type of problem but they appear to have been trained on longer form media and so do not perform well with short form sources such as twitter. Ian demonstrated an approach using the scikit-learn machine learning package for Python. This was a work in progress, for which he is looking for collaborators.

All in all a very enjoyable and interesting evening. I can heartily recommend Data Science London events if you get a chance to go.

Finally a big thank you to Carlos for organising such a great event.

DS_LDN12JUN2

Full ‘New Zealand’ House!

About Ian Hopkinson

I've worked as a scientist for the last 20 years, at various universities, a large health and personal care company and now @ScraperWiki, a software startup. I am @SmallCasserole on Twitter.
This entry was posted in events, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s