Sputnikmusic - Statnik 7: Joy Plots

Statnik 7: Joy Plots

By macman76 Saturday January 6, 2018

Hello budding data viz lovers, and welcome to a post that will put the division in Joy Division. When it’s time to explain patterns and numbers to layman, it’s usually done with the help of a graphic of some sort. Why? Because numbers are abstract and visual aids are useful tools to make the abstract concrete. But perhaps, too, it has another quality.

A great genre of argument is the “this seemingly boring and mundane activity is actually an art and full of wonders”. It’s a hit because it’s always true. We humans will assign meaning to anything we spend any amount of time doing. Anything and everything has some great novel/film/etc. about it and it has a r/ page full of memes. So data and statistics are no different. There’s r/DataIsBeauitful, some Neal Stephenson novels, and Moneyball. Without rethinking baseball management, scouting, and talent evaluation via the aid of analytics; how else would Brad Pitt have reconnected with his daughter (or gained a pioneer/icon status that has let him keep his job for so long despite little success)?

Data visualization bridges the gap from something that is obviously artistic, making visual images, to something that is only an art if you explain it, working with data. Thus, data visualization is an art in and of itself, and it’s something that is treated with care and respect in order to enrich the mundane into something full of meaning and import. And since people that work in data visualization can be said to have an appreciation of art, and enjoy popular music, a new type of data visualization emerged recently, the Joy Plot.

Some casual wikipedia reading informs me that the designer of Joy Division’s Unknown Pleasures album cover found an image from a dissertation, rearranged the white and dark of the image, and used it for the album cover. Not unlike what I do in this series and will do in this post, the designer stole (?) someone’s work, did a few simple changes, and repackaged it. And he did it because it’s so freaking cool.

Since its appearance in a fivethirtyeight article (and possibly before that, I don’t know), people have written code to make Joy Plots in R with available plotting tools, and an R package has been written and distributed specifically for this plot. The Joy Plots I will make will visualize density data of some sputnik rating data from my year end rating rankings list. A density plot shows how a set of data is distributed, it looks like a hill or bell, and its higher in the y-axis in areas where more data points exist. If you are familiar with histograms, like the ones you see on review pages of albums on this site, they are essentially histograms that have been smoothed. Since the site already provides histograms of the ratings for albums (at least as long as an album has a review), it would be boring to recreate those with Joy Plots. So instead I will simulate some data from the ratings.

If you’re familiar or not, my year end lists calculate the average for the top albums of that year, by giving more weight to users that use the site more as measured by how many comments, lists, reviews, and ratings they have listed on their user pages. The counts of these things are square rooted to somewhat marginalize the influence of very active users, a weighted average is calculated separately for each user-usage type, and another weighted average is done of these 4 weighted averages with higher weight given to the count of reviews and lists than comments and ratings since, at least it makes sense to me, that means more thoughtful and careful users who are more considerate of their ratings (or something, don’t @ me) are counted more than those with low counts.

The simulation of this data involves what is called a bootstrap. When you bootstrap data, you randomly sample it with replacement, calculate some statistic (like a mean or in this case the user-usage weighted mean), and do it again N times to make a distribution of the error that exists within the data. In this post for example, Go Farther in Lightness by Gang of Youths has 285 ratings by as many users. The way it is simulated is that 285 user’s ratings are randomly drawn (because it is with replacement, some user’s ratings are not drawn and some are drawn more than once; if it was done without replacement then you would calculate the same statistic every time, the only thing that would differ was the order of the ratings), the user-usage weighted average is calculated, and then a new draw is extracted; this process is done until 1000 random draws have been extracted. For this post, this was done for each of the top 20 albums of the 2017 list, and these draws are made into density Joy Plots. (Code for this can be found here, and my github repository includes the anonymized ratings for the top 20 albums.)

Using default settings of the R package, ggjoy, you get this.

Notice that some albums have very wide densities or “hills” indicating that there is more uncertainty about it’s estimate. (Note that this is not the way I calculate the confidence intervals in the year end posts, these simulations take a minute or so to run per album, so I do something more simple as outlined here.) This may be because it has fewer ratings than the other albums and thus more uncertainty, higher variance than the other albums, because it has a big disparity between high usage users and low usage users (so when some of the draws have more of one or the other it raises or lowers the rating a lot), or some mix of the three.

While we’re at it, what if we did that same plot with a different color pallet? Like a blend of color pallets from the wesanderson R package, specifically from my favorite movies of his The Royal Tenanbaums, Rushmore, and Moonrise Kingdom.

Finally, what if we brought it on home and did our best recreation of the Unknown Pleasures cover, based on code from this blog post?

If you did that, like Peter Saville before you, you would have taken something mundane like numbers and made them art.

BONUS

I did the Joy Plot with data from the Joy Division Unknown Pleasures ratings, with each density a bootstrap of user-usage weighted averages with random weights given to each of user usage types. META

BONUS 2

User SandwichBubble made this parody album cover with the above image. It is great.

Get your parody songs ready.

29 Comments

macman76
01.06.18

excited about this one

Sniff
01.06.18

Nice one

theacademy
01.06.18

upvoted hard

Sinternet
01.06.18

give this man sputnik gold

klap
01.06.18

10/10

Sniff
01.06.18

#StopDiscriminatingPeopleWithManyRatings2018

macman76
01.06.18

added a bonus joy plot

Voivod
01.06.18

Awesome post, keep them coming!

bgillesp
01.07.18

Sweet! Are those 20 albums you chose from the charts or what?

Pheromone
01.07.18

haha this is great

macman76
01.07.18

Top 20 from my previous blog

bgillesp
01.07.18

Ah. Makes sense. This is prettier than joy division's cover

SandwichBubble
01.07.18

https://i.imgur.com/ZEI2nDj.png

macman76
01.07.18

@sandwich Excellent

SandwichBubble
01.07.18

I've been waiting for a guide to come
And weight my averages

macman76
01.07.18

There should be a weird al but for statistics rather than food

neekafat
01.07.18

I still don't get it :/

macman76
01.07.18

btj88
01.08.18

This is awesome. One question, I'm pretty green when it comes to data transformation, but was there a reason you chose to square root the counts instead of taking the log?

macman76
01.08.18

The simple reason is that some of these counts are 0 and the log of 0 is impossible, i could have made a function that returned zero if the input was zero but i followed the Rachel Ray principle instead, “measurements are arbitrary anyways so wing it and hope no one asks questions”

btj88
01.08.18

Right on, I forgot 0 was option for rating (who votes something a zero?) Rachel Ray was foxy, you should call this blog 30 Hour Coding.

Spacesh1p
01.08.18

Great stuff man.

macman76
01.08.18

Oh no, the ratings are all 1-5, it’s just that you may have no lists or reviews or comments (you’re guaranteed one rating given that you rated at least the album that the weighted average is being calculated for). Is 30 hour coding a reference to something?

macman76
01.08.18

Rachel Ray is a personal hero, fyi
https://www.rachaelray.com/2008/11/10/how-do-you-eyeball-it/

btj88
01.08.18

I was attempting a play on "30-Minute Meals", her old Food Network show, but it was a poor attempt at that, haha.

macman76
01.08.18

@btj lol, guess i don’t love ray that much to get the reference

Love the album art by sandwich, in case no one knows, he/she also made the statnik banner

Conmaniac
01.08.18

this is quite impressive but also such a strange, niche lil thing here. keep doin u macman

macman76
01.11.18

Bumping this once more since I like it

Also, new post soon

numbersforkids
01.13.18

good..https://youtu.be/_KJqhkcuu-k

Leave a Response

Click here to cancel reply.

You need to be logged in to post a comment
Login | Register

Statnik 7: Joy Plots

Leave a Response

Talking Points