INDEX
Explanations
references to political figures and their statements or actions
New Auto-Interp
Negative Logits
emoc
-0.17
ahun
-0.15
ropp
-0.15
LBL
-0.14
roadcast
-0.14
nel
-0.14
aidu
-0.14
yearly
-0.14
DNA
-0.14
.utf
-0.14
POSITIVE LOGITS
White
0.18
DAG
0.17
izza
0.16
White
0.16
Proud
0.15
åħ¼
0.15
EK
0.14
Vanity
0.14
ozem
0.14
NR
0.14
Activations Density 0.029%