INDEX
Explanations
social or political commentary and criticism
New Auto-Interp
Negative Logits
rn
-0.65
ayette
-0.65
plain
-0.63
hur
-0.61
ASC
-0.61
irlf
-0.61
aith
-0.61
eminent
-0.59
planes
-0.58
cham
-0.58
POSITIVE LOGITS
2016
1.14
2015
1.11
2017
1.10
2014
1.09
2018
1.06
2010
1.03
2012
1.02
2013
1.00
2008
0.93
2011
0.93
Activations Density 0.024%