INDEX
Explanations
mentions of political figures and their actions related to social issues
New Auto-Interp
Negative Logits
addCriterion
-0.16
endale
-0.16
æ³ī
-0.15
249
-0.15
\↵
-0.14
icontrol
-0.14
ktor
-0.14
\↵
-0.14
otor
-0.14
stÃŃ
-0.14
POSITIVE LOGITS
ouch
0.15
aliz
0.14
&o
0.14
bracket
0.14
Enumerator
0.14
oodoo
0.14
vel
0.13
omin
0.13
Duch
0.13
rad
0.13
Activations Density 0.491%