#InspiringWomen – catching twitter with ScraperWiki

Commodore Grace Hopper

Commodore Grace Hopper

Those of you on twitter may have caught the recent #InspiringWomen hash tag, this was a response to the online abuse and threats received by many women in the public eye. On Sunday 4th August people tweeted about women who inspired them marking their tweets with the #InspiringWomen hashtag.

#InspiringWomen was launched by my friend, @daintyballerina. In the aftermath she asked: how to capture all of these tweets? The problem being that the number of tweets (about 40,000) involved does not sit well with many online tools since they take a long time to download.

This is a job for the ScraperWiki Search for tweets tool! Getting this going is simply a matter of typing in a search term and clicking a couple of buttons. The search tool uses the free twitter API so takes a little while to get such a big set of tweets but this is OK since the ScraperWiki platform will chug away getting tweets even if you’ve switched off your computer. Anyway overnight, I had collected 40,000 tweets.

I offered to supply all of the tweets, with 16 pieces of information for each tweet, to @daintyballerina as an Excel spreadsheet. This is easy to do with the Download as spreadsheet tool. However, we agreed that it would be best if I supplied the unique tweets along with the name of the twitter user and the number of tweets in descending order of popularity. Looking at the 20,000 or so remaining tweets I then decided to limit to just the tweets with at least one retweet. This set of data can be generated from the tweets using the Query with SQL tool. SQL is a database querying language of long pedigree. I then exported the tweets to Word and hence to @daintyballerina.

You can see the archive of tweets at the Inspiring Women Project website. For the impatient the most retweeted five nominations were:

  1. Emma Watson
  2. Ada Lovelace
  3. Delia Derbyshire
  4. JK Rowling
  5. Hedy Lamarr

But the fun doesn’t stop there!

The Summarise Automatically tool gives you a quick view of any dataset, making a guess at how best to show the data. For example, a column with lots of text is shown as a wordle:

Inspiring Women Wordle
The wordle skips common words like if and it.

The twitter API returns both media and URL data, these both contain at least some images. Columns with links to images are shown as a montage of the most frequent images:

Inspiring Women

On the ScraperWiki platform these images are linked and you can use the Google Image Search to find out who they are. From top left to bottom right, these images show Emma Watson, Lucille Bluth – fictional character from Arrested Development, Girls Generation – South Korea girl band, hairy guy – something of a gay pinup, Marilyn Monroe, Donald Trump (I don’t know!), Rosa Parks, “Hehalutz women captured with weapons”  – from the report of Jurgen Stroop on the Warsaw Ghetto UprisingNellie Spindler – the only woman to die at the Ypres Salient in the First World War, Mary Shelley, Nancy Pelosi, Hayley Williams lead vocalist in Paramore, Carribean woman from the Auxillary Territorial Service WWII, Women’s Land Army recruitment poster, Violett Szabo – Allied spy executed by the SS and finally a model wearing a cardigan. It’s possible the model wearing a cardigan is someone famous.

Summarise automatically also produces a histogram of when tweets with the #InspiringWomen hash tag were sent. This is OK but the resolution of one day is a bit low so I downloaded the data as a spreadsheet and viewed it using Tableau Public. This works nicely – except Tableau is very picky about date time formats and I had to create a new column in Excel to get it to import. Thanks to Tableau Public I can not only show you the result as an image:

InspiringWomenTimeline
…but you can play with an interactive version of the plot on Tableau public, here. This allows you to see the underlying data, download it as an image or download it to Tableau Public on your computer to change as you wish.

The chart shows the number of tweets per minute from late on Saturday 3rd August through to the morning of 6th August; higher peaks mean more tweets.

As you can see the discussion of #InspiringWomen started on the evening of Saturday 3rd and rose during the morning until 11am when everyone woke up and started tweeting. The big peak at just after midnight on Monday 5th November is the nomination for Emma Watson (the actress who played Hermione in the Harry Potter films) which was then heavily retweeted. The very thin big spike at 6am the same morning is spammers randomly re-tweeting one of the #InspiringWomen tweets – you can tell this by looking at the user names which are all a bit random looking. Presumably they appear in a big spike because otherwise twitter would filter them automatically.

What would you want to know about a hash tag?
Let me know.

Footnote

The picture at the top of this post is Grace Hopper, who popularised the term “bug” with reference to programming, although in her case bugs were literally bugs – insects found on the valves of early computers. On the day, I nominated my mum as an #InspiringWoman, she started programming in the early 1960’s but she wouldn’t want me to share her picture with the world!

About Ian Hopkinson

I've worked as a scientist for the last 20 years, at various universities, a large health and personal care company and now @ScraperWiki, a software startup. I am @SmallCasserole on Twitter.
This entry was posted in Data Science and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s