INDEX
Explanations
terms related to anti-establishment sentiments and negativity towards various political and social issues
New Auto-Interp
Negative Logits
erate
-0.19
hetto
-0.18
hop
-0.15
hover
-0.15
oader
-0.14
ISTIC
-0.14
kest
-0.13
olmak
-0.13
олÑİ
-0.13
ifen
-0.13
POSITIVE LOGITS
Chr
0.14
/in
0.14
uzzi
0.14
olicited
0.14
Ñĥбли
0.14
/off
0.14
yw
0.13
Mens
0.13
ucci
0.13
-Semitic
0.13
Activations Density 0.037%