December 23, 2011

As the year winds down to a close, we wanted to thank all of the people — our clients, colleagues, independent contractors, advisors, friends, and family — that have contributed to our success this year. As a token of our gratitude, we put together a “map of Christmas carols” for your holiday pleasure.

In the map, Christmas carols are represented by circles. The similarity between the lyrics of carols is represented by the lines between them; thicker, darker lines indicate that the carols are more similar. It appears there are two visually distinct clusters of carols, roughly corresponding with religious and secular carols!

Mouseover a circle to see the name of the corresponding carol. Click on a circle to view the lyrics of the carol. Click and drag a circle to move around the network.

static image of christmas carol map

Data for the diagram come from an awesome 90s-era website. We use nltk to remove stop words and stem the remaining text of the Christmas carols and we use gensim to assess the similarity of carols based on the cosine similarity between term frequency–inverse-document frequency vectors. We then construct a network of Christmas carols using the 150 most similar pairs of Christmas carols and display everything using d3's force-directed network layout.

Happy Holidays!

