INDEX
Explanations
email addresses and Twitter handles
references to social media platforms, particularly Twitter
New Auto-Interp
Negative Logits
Exhibit
-0.68
idols
-0.66
Appendix
-0.65
trance
-0.64
brackets
-0.63
Pharaoh
-0.62
aby
-0.62
ingly
-0.60
prol
-0.59
Photoshop
-0.59
POSITIVE LOGITS
afort
0.78
iott
0.75
UNCLASSIFIED
0.73
toll
0.72
jon
0.71
endez
0.70
orthern
0.70
weet
0.69
chell
0.69
ahon
0.69
Activations Density 0.140%