Data visualisation is an abstraction

15 Mar 2018

Beers and Burrows define data visualisation as visual representation of data and datasets which communicate precise information and values1. In that sense, anything that does not lead to precise information and values in the dataset is not a data visualisation. As the information design is evolving it is becoming harder to define data visualisations, for instance force layouts are not precise representations of the data. Force layouts are an abstraction to a level that it is almost impossible (if not impossible) to get to the actual data underlying the relationships between the nodes. The distance between the nodes in a balanced force layout is irrelevant, it’s just the best layout the indeterministic algorithm could find. The edge between 2 nodes symbolises relationship between them , but it is not apparent how ‘closely’ those nodes are connected. Some visualisations try to encode the ‘closeness’ in the visual stroke weight of the edge or colour of the edge. From visual perception studies, we know that for humans (target audience) position, distance, length is more relevant as compared to thickness or colour.

Is there a loss of data when we turn it into geometric shapes/mathematical representations?

One could easily argue that even with Excel sheets there is loss of data because of our inability to absorb all the numbers at once, that argument would be true. Visualisations are not perfect tools that can make humans process all the data at once in their head, in fact that inability led to invention of information design field. This loss of data, or let’s call it data abstraction, is more important these days than ever before because of how ordinary people (not data experts/literates) engage with visualisations. Data visualisation has become a medium to make arguments (climate change visualisation), represent news(US election 2016), marketing, advertisement and much more.

According to Periscopic, a data visualisation studio, data can be used to do good. The claim is incredibly abstract and hard to measure. Visualisation designers believe that they can do good with data by being true to datasets2. As observed by The work that visualisations do2, one of the designers said:

Sometimes it happens [that clients want to tell stories that are not present in the data], but then we just show them that they story isn’t there and that we cannot force the data to tell that story.

One of the interviewee in the same paper said: “The data can only be the shape it is”. The intentions of visualisation designers may be appreciable and as neutral as possible, but visualisations they produce still might be hiding the truth (partially or completely3). The unintentional lie of omission might always happen, as a designer is also human and cannot perceive the data all at once. Sometimes even a design decision can motivate omission of important data. After all data visualisations are just another abstraction of the data we have.

Measuring data abstraction

Like Edward Tufte’s idea of measuring data-ink ratio is quantifying the idea of information density in an data visualisation. Could there be a way to measure the abstraction?