Hello budding data viz lovers, and welcome to a post that will put the division in Joy Division. When it’s time to explain patterns and numbers to layman, it’s usually done with the help of a graphic of some sort. Why? Because numbers are abstract and visual aids are useful tools to make the abstract concrete. But perhaps, too, it has another quality.
A great genre of argument is the “this seemingly boring and mundane activity is actually an art and full of wonders”. It’s a hit because it’s always true. We humans will assign meaning to anything we spend any amount of time doing. Anything and everything has some great novel/film/etc. about it and it has a r/ page full of memes. So data and statistics are no different. There’s r/DataIsBeauitful, some Neal Stephenson novels, and Moneyball. Without rethinking baseball management, scouting, and talent evaluation via the aid of analytics; how else would Brad Pitt have reconnected with his daughter (or gained a pioneer/icon status that has let him keep his job for so long despite little success)?
Data visualization bridges the gap from something that is obviously artistic, making visual images, to something that is only an art if you explain it, working with data. Thus, data visualization is an art in and of itself, and it’s something that is treated with care and respect in order to enrich the mundane into something full of meaning and import. And since people that work in data visualization can be said to have an appreciation of art, and enjoy popular music, a new type of data visualization emerged recently, the Joy Plot.
Some casual wikipedia reading informs me that the designer of Joy Division’s Unknown Pleasures album cover found an image from a dissertation, rearranged the white and dark of the image, and used it for the album cover. Not unlike what I do in this series and will do in this post, the designer stole (?) someone’s work, did a few simple changes, and repackaged it. And he did it because it’s so freaking cool.
Since its appearance in a fivethirtyeight article (and possibly before that, I don’t know), people have written code to make Joy Plots in R with available plotting tools, and an R package has been written and distributed specifically for this plot. The Joy Plots I will make will visualize density data of some sputnik rating data from my year end rating rankings list. A density plot shows how a set of data is distributed, it looks like a hill or bell, and its higher in the y-axis in areas where more data points exist. If you are familiar with histograms, like the ones you see on review pages of albums on this site, they are essentially histograms that have been smoothed. Since the site already provides histograms of the ratings for albums (at least as long as an album has a review), it would be boring to recreate those with Joy Plots. So instead I will simulate some data from the ratings.
If you’re familiar or not, my year end lists calculate the average for the top albums of that year, by giving more weight to users that use the site more as measured by how many comments, lists, reviews, and ratings they have listed on their user pages. The counts of these things are square rooted to somewhat marginalize the influence of very active users, a weighted average is calculated separately for each user-usage type, and another weighted average is done of these 4 weighted averages with higher weight given to the count of reviews and lists than comments and ratings since, at least it makes sense to me, that means more thoughtful and careful users who are more considerate of their ratings (or something, don’t @ me) are counted more than those with low counts.
The simulation of this data involves what is called a bootstrap. When you bootstrap data, you randomly sample it with replacement, calculate some statistic (like a mean or in this case the user-usage weighted mean), and do it again N times to make a distribution of the error that exists within the data. In this post for example, Go Farther in Lightness by Gang of Youths has 285 ratings by as many users. The way it is simulated is that 285 user’s ratings are randomly drawn (because it is with replacement, some user’s ratings are not drawn and some are drawn more than once; if it was done without replacement then you would calculate the same statistic every time, the only thing that would differ was the order of the ratings), the user-usage weighted average is calculated, and then a new draw is extracted; this process is done until 1000 random draws have been extracted. For this post, this was done for each of the top 20 albums of the 2017 list, and these draws are made into density Joy Plots. (Code for this can be found here, and my github repository includes the anonymized ratings for the top 20 albums.)
Using default settings of the R package, ggjoy, you get this.
Notice that some albums have very wide densities or “hills” indicating that there is more uncertainty about it’s estimate. (Note that this is not the way I calculate the confidence intervals in the year end posts, these simulations take a minute or so to run per album, so I do something more simple as outlined here.) This may be because it has fewer ratings than the other albums and thus more uncertainty, higher variance than the other albums, because it has a big disparity between high usage users and low usage users (so when some of the draws have more of one or the other it raises or lowers the rating a lot), or some mix of the three.
While we’re at it, what if we did that same plot with a different color pallet? Like a blend of color pallets from the wesanderson R package, specifically from my favorite movies of his The Royal Tenanbaums, Rushmore, and Moonrise Kingdom.
Finally, what if we brought it on home and did our best recreation of the Unknown Pleasures cover, based on code from this blog post?
If you did that, like Peter Saville before you, you would have taken something mundane like numbers and made them art.
I did the Joy Plot with data from the Joy Division Unknown Pleasures ratings, with each density a bootstrap of user-usage weighted averages with random weights given to each of user usage types. META
User SandwichBubble made this parody album cover with the above image. It is great.