The persistence of underdogs, part 2

March 26, 2011

Based on our critique of Leonardo Aranda's diagram we set out to create our own diagram that was more useful for the purpose of describing the progression of different seeds throughout the NCAA basketball tournament. A quick search revealed a helpful script by Matthew Beckler that parses data from 1985 to 2010. A little analysis and a quick PyGrace script later, and "Voilà!", we have our first version of a diagram that shows the fraction of times that each seed appears in a particular round of the tournament.

$ugly graph: fraction of seeds per NCAA tournament round$

Importantly, this diagram aggregates all of the regions into the same analysis — the only thing that makes sense in this case — and it preserves the height between subsequent rounds so one can quantitatively compare successive rounds. These improvements make it obvious, for instance, that the 1-seeds win the tournament over half of the time and that a 5-seed has never won the tournament.

In our quick first pass, we chose to use a diverging color scheme to illustrate the differences between 9-16 seeds (shades of purple) and 1-8 seeds (shades of orange). Although this does a good job of highlighting how long 9-16 seeds persist in the tournament, one could imagine improving the color scheme to further illustrate how long other underdogs persist. For example, if the seeding were perfect, there should be no 9-16 seeds after the First Round, no 5-8 seeds after the Second round, etc.; this could be made more obvious by a well-chosen color scheme.

Another thing that we find unsettling is that this diagram is too “orthogonal” (to put it nicely, perhaps “fugly” is more candid). This makes the visualization difficult to follow, geeky, and consequently a little inaccessible.

Ideas on further ways to improve the diagram are welcome! More to come in following posts.