INDEX
Explanations
phrases related to accusations of racism and the dynamics of political opposition
New Auto-Interp
Negative Logits
ovaly
-0.18
366
-0.16
vyk
-0.15
ëĬ¥
-0.15
ysi
-0.15
Rosenstein
-0.14
ÑĸÑĢ
-0.14
lider
-0.14
indeb
-0.14
inkel
-0.13
POSITIVE LOGITS
trait
0.24
li
0.20
unp
0.18
dangerous
0.18
Tra
0.17
li
0.17
trait
0.17
unfit
0.17
closet
0.17
coll
0.15
Activations Density 0.160%