Adventures in Tableau – loading files

TableauLogo

Tableau is a widely used visualisation tool, particularly in the business intelligence area. It grew out of the Polaris project at Stanford University, subtitled “interactive database visualisation”. This is worth bearing in mind since it is the context in which Tableau deals with data. It anticipates that the data you are interested in comes in the form of database tables which has a downside in terms of flexibility for the data it can import and an upside in terms of connecting tables of data in the form of conventional database joins.

The ScraperWiki platform makes a nice fit with Tableau since we specialise in sourcing and offering up cleaned data whilst Tableau specialises in visualising clean data; we’ve found it to be a powerful tool in this respect.

I’m planning a series of posts on Tableau, my aim is not to write a list of recipes i.e. “Click this to get this” but to describe how Tableau works, thus making it easier to address new challenges. I’m using Tableau Version 8.0 Professional. I’ll start today with a post on how to load data into Tableau, future posts will cover measures and dimensions, calculated fields, creating visualisations, filtering and so forth (to cover the fact I haven’t planned everything out in advance!).

This first post is about the process of loading data into Tableau. Tableau makes a clean separation between data and visualisation. The first step in making a visualisation is connecting to the data, accessed via the Data->Connect to Data menu item, or the Connect to Data link in the panel at the top left of the screen. This data connection can be to one of the following local sources:

  • Microsoft Access
  • Microsoft Excel
  • Text file
  • Tableau data extract
  • Import from workbook

Tableau data extracts (*.tde) are Tableau’s native, binary data format, typically they are built by importing data from other sources. They can also be generated programmatically using the Tableau Data Engine API. A workbook is Tableau’s label for a visualisation file, import from workbook just gets the data used in another workbook.

Alternatively, in the Professional version, there is a list of 30 or so server based sources including SQL databases of various types, Hadoop based databases, data warehouses, and a couple of services such as Google Analytics and Salesforce.

Tableau data sources

Tableau data sources

It is also possible to do a simple copy-paste from a source like Excel using the Data->Paste Data menu option. For the Excel and Text files options your files need to be formatted in a particular way – essentially to look like database tables. That’s to say with a single column header line (maximum), and for each column to contain the same type of data.

Once you have selected a data source you are given options, which vary a little bit between sources.

Tableau File Import dialog

Tableau File Import dialog

The key variation between these methods is in the connection/import options, the database connections need authentication and connection information. The interesting part is in the variously named table selection part: for all data sources it is possible to select multiple tables at import and join them together, doing this enables you to add new columns to a parent table using a common column as a key. For example, you may have a column in your main table which is a short code for something (e.g. vehicle fuel type), in another small table you might have a lookup which gives the long name for these short codes. Joining the tables at load adds this long name column whilst keeping the short code. There is an alternative method of data blending which keeps the data sources separate whilst making a connection between them.

Finally, you are asked whether you want to Connect Live to the data, Import All data or Import Some Data. Connect Live means that Tableau accesses the original file rather than producing a Tableau Data Extract.

Tableau connection type dialog

Tableau connection type dialog

Once you configured a data connection you can save it for future reuse in using the menu item Data->[Your data connection name]->Add to saved data sources. Saved Data Sources (*.tds) are small xml files. There are a whole bunch of options on the Data->[Your data connection name] menu.

Tableau Data Menu

Tableau Data Menu

Generally they enable you to make changes to the options chosen when setting up the data connection.

Tableau uses the Microsoft database engine to connect to Excel and text based sources, it will automatically decide what type of data appears in each column (i.e. string, number, date) sometimes, for CSV files, it gets it wrong in which case you need to use this trick.

You can check your data has loaded correctly using View Data, variously found as an icon at the top of the Dimensions pane, or on the menu item Data->[Your data connection name]->View Data… Once the data has been successfully loaded a list of variable names appears in the Dimensions and Measures panels at the side of the screen, more of these in the next post.

Are you interested in a Tableau connector tool for ScraperWiki? If so let me know at ian@scraperwiki.com

About Ian Hopkinson

I've worked as a scientist for the last 20 years, at various universities, a large health and personal care company and now @ScraperWiki, a software startup. I am @SmallCasserole on Twitter.
This entry was posted in Data Science and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s