| Sputnikmusic
 

stat_banner

Hello fellow football fans, and welcome to a post that will assign numbers to your footy feelings. On the eve of the 2018 World Cup, I noticed that no one had made a “World Cup Thread”-type list, so I decided to start one. At some point, I realized that I could leverage the comments people were making into some sweet, sweet content. Specifically, I sought to measure the sentiment of each comment (positive or negative) which I could then summarize by World Cup team and by user.

Recently, I was googling sentiment analysis and came upon this post. The post describes and has code for a model that uses the words of tweets to predict the sentiment of each tweet (a sentiment of 1 being positive, 0 negative). The post is from about a year ago, uses tweets as training data not sputnik comments, so it may not exactly match the vocabulary and sincerity level of our own sputnik soccer commenters. Regardless, I fit the sentiment model from the code in that post, scraped the comments from the World Cup 2018 list following the conclusion of the group stages, and then fit the sentiment model to each comment. The model assigns each comment a value between 0 and 1, 1 being positive, and 0 being negative. Most comments lie somewhere in the middle, ~ 75% of comments are between .25 and .75 and ~ 92% are between .1 and .9.

So, after classifying every comment with that model, I searched the comments for mentions of each team. (Note: I used regular expression to find them, so if you were playing the pronoun game when referring to a team, I could not detect a comment as containing a particular country.) Many comments mentioned multiple teams, some that seemingly had multiple sentiments within them, so I decided to only assign a label to comments that only mentioned one country.

The model has a fairly good test AUC of .87 (meaning that it performs pretty well on predicting sentiment out of sample), but it isn’t perfect. For instance here are some high sentiments comments(>.9) that seem off:

Are there any proud prostitution moments?

let’s see what belgium does

Told ya

come off it will ya

Kane is awesome and thicc as fuck agreed

The model is also not a good lie detector, apparently. Here are some weird low sentiment comments (<.15):

Modric tho

coutinho god damn

damn that was intense

Another problem, when someone repeats someone else’s comment but with a “[2]”, it will give slightly different sentiments. For instance, “Argentina are shockingly awful ” and “Argentina are shockingly awful [2]” get a value of .099 and .125 respectively. All models have problems. You know what they say, “Every model is wrong, but some are shockingly awful god damn that was intense tho.”

(Code/data for this is here. In order to run the script you will need to download the tweet data, link for which is found here.)

The following is a plot of the sentiment of each comment, in order of when they were made, with placed a flag on each point for each comment that contained the name of each country. (Flag images courtesy of gosquared).

So much England

So much England

After adjusting for uncertainty by using the lower bound of the 95% confidence interval, the following is the sentiment table ranking for each country. (The sentiment column contains the average sentiment, which is NOT directly used for ranking, it gets adjusted by the number of comments (n_comments)).

l95_Rank team sentiment n_comments
1 england 0.606 65
2 croatia 0.663 8
3 argentina 0.531 31
4 sweden 0.587 13
5 portugal 0.538 21
6 south korea 0.611 9
7 germany 0.504 28
8 senegal 0.626 7
9 spain 0.603 8
10 switzerland 0.631 6
11 belgium 0.565 10
12 tunisia 0.560 10
13 denmark 0.558 9
14 egypt 0.587 7
15 mexico 0.556 9
16 brazil 0.504 14
17 japan 0.536 8
18 panama 0.492 10
19 iceland 0.501 9
20 iran 0.444 11
21 russia 0.546 4
22 peru 0.529 3
23 france 0.351 10
24 uruguay 0.523 2
25 australia 0.514 2
26 colombia 0.352 7
27 morocco 0.614 1
28 nigeria 0.489 2
29 poland 0.335 5
30 saudi arabia 0.298 1
31 costa rica 0.144 1
32 serbia 0.001 1

After the uncertainty adjustment, the following is the sentiment table ranking the top 20 most positive users followed by the top 20 most negative. (84 unique users have commented so far).

l95_Rank user sentiment n_comments
1 osmark86 0.586 85
2 hal1ax 0.659 20
3 zakalwe 0.583 61
4 DoofusWainwright 0.567 85
5 anatelier 0.663 14
6 Egarran 0.554 62
7 Flugmorph 0.558 55
8 deezer666 0.546 68
9 pypypymble 0.533 63
10 RadicalEd 0.562 37
11 Sniff 0.553 41
12 Winesburgohio 0.626 14
13 DDDeftoneDDD 0.589 19
14 jagride 0.665 9
15 Maco097 0.583 17
16 Casavir 0.636 10
17 anarchistfish 0.535 27
18 Demon of the Fall 0.539 25
19 iglu 0.534 26
20 Evreaia 0.609 11

u95_Rank user sentiment n_comments
1 Dewinged 0.434 21
2 rabidfish 0.469 32
3 Sinternet 0.483 29
4 Alastor 0.406 12
5 pypypymble 0.533 63
6 deezer666 0.546 68
7 DoofusWainwright 0.567 85
8 Egarran 0.554 62
9 RunOfTheMill 0.417 10
10 Flugmorph 0.558 55
11 Keyblade 0.334 5
12 osmark86 0.586 85
13 Sniff 0.553 41
14 TheNotrap 0.379 6
15 zakalwe 0.583 61
16 anarchistfish 0.535 27
17 Doctuses 0.466 12
18 iglu 0.534 26
19 RadicalEd 0.562 37
20 Kusangii 0.441 9

Some users appear in both the most positive and most negative user rankings. It’s because they commented a lot and the uncertainty adjustment pushes them to the top for both lists. Or it could be math demonstrating its appreciation for the highs and lows, the drama, and the beauty of the game. Math, it contains multitudes.


Update 7/15/2018

Following the conclusion of the knockout stage of the WC, here are the updated team sentiment tables.

l95_Rank team sentiment n_comments
1 england 0.572 145
2 croatia 0.571 48
3 brazil 0.555 35
4 belgium 0.552 36
5 sweden 0.590 20
6 portugal 0.561 25
7 argentina 0.536 34
8 germany 0.517 32
9 south korea 0.621 9
10 france 0.469 51
11 senegal 0.631 7
12 denmark 0.510 20
13 tunisia 0.544 13
14 switzerland 0.633 6
15 spain 0.533 13
16 mexico 0.525 13
17 japan 0.500 17
18 egypt 0.588 7
19 russia 0.476 18
20 colombia 0.435 21
21 uruguay 0.470 13
22 panama 0.495 10
23 iceland 0.505 9
24 iran 0.438 11
25 peru 0.435 10
26 australia 0.525 2
27 morocco 0.620 1
28 nigeria 0.494 2
29 poland 0.342 5
30 serbia 0.272 3
31 saudi arabia 0.332 1
32 costa rica 0.130 1

Most positive users:

l95_Rank user sentiment n_comments
1 zakalwe 0.602 140
2 hal1ax 0.616 91
3 osmark86 0.605 108
4 DoofusWainwright 0.570 112
5 Egarran 0.548 112
6 jagride 0.627 26
7 RadicalEd 0.546 107
8 danielcardoso 0.857 4
9 Sniff 0.571 54
10 deezer666 0.525 159
11 anatelier 0.645 18
12 Flugmorph 0.559 63
13 pypypymble 0.543 83
14 Darius 0.561 44
15 anarchistfish 0.505 110
16 iglu 0.535 56
17 Winesburgohio 0.620 16
18 AngryJohnny 0.623 15
19 Demon of the Fall 0.537 46
20 adr 0.547 39

Most negative users:

u95_Rank user sentiment n_comments
1 rabidfish 0.462 72
2 anarchistfish 0.505 110
3 deezer666 0.525 159
4 Ryus 0.466 31
5 DoofDoof 0.513 62
6 Sinternet 0.492 38
7 RadicalEd 0.546 107
8 Maco097 0.493 37
9 Egarran 0.548 112
10 Dewinged 0.490 35
11 Alastor 0.411 13
12 DominionMM1 0.486 31
13 pypypymble 0.543 83
14 Thalassic 0.487 29
15 iglu 0.535 56
16 Evreaia 0.510 34
17 DoofusWainwright 0.570 112
18 Demon of the Fall 0.537 46
19 Flugmorph 0.559 63
20 Valkoor952 0.459 14

Also, here is a plot of the sentiment by team for the knockout stages:

The "Sooo much England"-ness continues

The “Sooo much England”-ness continues





macman76
06.29.18
ill eventually get back to part 2 of statnik 12

if you want, i can assign a sentiment to any comment you give, just tell me you want that

Papa Universe
06.29.18
"30 saudi arabia 0.298 1"
I have a feeling that this is that one joke comment I made.

Sinternet
06.29.18
oh shit i forgot about the thread. don't know why mine are so negative tho :^(

macman76
06.29.18
i read them, maybe its your cursing, cuz otherwise you seem to make a lot of positive or simply neutral comments

neekafat
06.29.18
“Every model is wrong, but some are shockingly awful god damn that was intense tho.”
Lmao

bgillesp
06.29.18
Soccer is dumb

macman76
06.29.18
@neek I really missed an opportunity to add a [2] to that joke

RogueNine
06.29.18
Lol yeah Neeka that's one of the best sentences I've ever read on Sputnik.

neekafat
06.29.18
[2]

iglu
06.29.18
I made both lists lol

Storm In A Teacup
06.29.18
Macman strikes again!

bgillesp
06.29.18
I hate soccer. Can I make the neg list?

bgillesp
06.29.18
Germany plays like poopoo and Spain isn't good

macman76
06.29.18
@bgill youll be number 1 on the big hater list, playa hater hall of fame

butcherboy
06.29.18
so few comments about france, who are going to win this thing, if they manage the tiniest bit of discipline.. better talent than any other team this time around..

bgillesp
06.29.18
France pees in their pants during games

anarchistfish
06.29.18
A for effort

bgillesp
06.29.18
Teams from countries that start with A are not good at playing.

macman76
07.01.18
Bump, Ill probably make an update to the post following the conclusion of the final of the WC

macman76
07.16.18
alright, post is updated with the latest, WC was fun and I have learned how to wake up super early, thanks free TV sports

Dewinged
07.16.18
Colossal task mac, nicely done. But wait, I am the most postive person around, how can I only be in the negative section? lol

macman76
07.16.18
This is a wake up call Dewi, everyone has been saying that your word usage is >.5 positive.

bgillesp
07.16.18
Fishes hate soccer

anarchistfish
07.16.18
i like how i jumped up to one of the most negative users after my team won

You need to be logged in to post a comment
Login | Register

STAFF & CONTRIBUTORS // SITE FORUM // CONTACT US

Bands: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


Site Copyright 2005-2017 Sputnikmusic.com
All Album Reviews Displayed With Permission of Authors | Terms of Use | Privacy Policy