Rethinking the visualization of weighted directed networks

November 4, 2011

Hello! My name is Suet. I’m a senior at Northwestern University, and most recently Datascope employee #3. This summer I worked on a project to design a visualization tool for Northwestern University’s Graduate School. You can read more about the project in detail here — the basic idea is that the Graduate School wanted to provide a means for prospective students to explore what their curriculum would be like at Northwestern and to illustrate the interdisciplinary nature of graduate study at Northwestern.

Network diagrams are a great way to show interdepartmental connections, something that Datascope has done for several organizations at Northwestern, namely McCormick, ISEN and the Segal Design Institute. After some brief discussions with the folks at Northwestern, we quickly realized that the Graduate School project required us to go back to the drawing board (or the whiteboard, here at Datascope) to address some unique challenges associated with the project that were not present in the others:

It is a directed network. For every pair of nodes, say A and B, there can be a link going from A to B and another going from B to A. The Graduate School wanted to know if an academic program, like History, was both (a) encouraging its students to take courses outside of History, and (b) welcoming students from other programs to take its History courses. Simply put, is there reciprocity? When there’s traffic in, is there also traffic out?
It is a weighted network. Not only was the directionality of flow important, so was the proportion. What if 100 students in History took classes in English, but only 1 English student took a class in History? This isn’t exactly the ideal kind of reciprocity we were hoping for, and directed links alone wouldn’t have sufficed in telling us that. We wanted to somehow incorporate the proportion/strength/magnitude of these links too.
Self-links had to be represented. In this context, a self-link is when students take classes in their own home programs, which is common, if not the norm, for any PhD curriculum. Again, we were interested in comparing the proportion of self-links to links that were coming in or going out of the program to understand if the program was truly collaborative.
The network develops and changes over time. In fact, this was one of the main aims of the visualization — to show that collaboration among the academic programs has increased over the years since the introduction of the clusters initiative. We would compare ‘snapshots’ of each network at a particular time.

It was important that all these characteristics were properly represented in the diagram. Put together, we wanted a directed, weighted, and time-specific network that also displayed self-links. Whoa! This is a lot of information. One of the important lessons we learned through the many visualizations we’ve come across — both good and bad — is that while a visualization should aim to be informational, more information isn’t always better. In fact, “more” is often confusing. To avoid this trap, we designed the visualizations to be as simple as possible without dumbing down the message.

Force-directed network layouts are good for seeing macroscopic “clumps” of connectivity, but they tend to become too overwhelming as more details are added. Rather than showing all of the link weights and orientations, we instead took a minimalistic approach to only emphasize the clumpiness. We simplified the weighted and directed nature of the network into an unweighted and undirected representation and, only after clicking on a particular program, are the link weights and directions revealed. This is advantageous as it allows a user to explore the network in detail with all of the irrelevant stuff stripped away and only revealing the details on demand.

What about self-links? A rather coarse way of doing it would have been to draw an arc from a node back to itself, but that would have been… eurgh. During our brainstorming process we came across some neat flow diagrams, whose formal name is the Sankey diagram. (See examples here and here) We thought that this could be an elegant solution to the problem. A self-link was simply a flow from a program on the left to the same program on the right. The concept of flows was intuitive and easy to comprehend visually. The bars which represented programs were sized according to the sizes of the program, allowing for easy comparison of the proportions of flows.

These two network views can stand alone as separate visualizations, but having them side by side allows one to understand the strengths and weaknesses of each, and when one may be more powerful than the other in certain circumstances. Personally, I had a ton of fun learning and working on this project - for anyone who has had the time to explore the site, I hope that it has been fun and useful for you as well!

As always, we welcome your comments and suggestions so that we may improve our approach to visualizing weighted directed networks in the future. Do you prefer the force-directed layout or the Sankey diagram? What do you think about the interactivity? All comments are welcome and encouraged!