INDEX
Explanations
statements related to political criticism and advocacy
New Auto-Interp
Negative Logits
igy
-0.16
ctl
-0.14
WITHOUT
-0.13
.dy
-0.13
illy
-0.13
mez
-0.13
ÏĦÏģ
-0.13
Altern
-0.13
Ãĸr
-0.13
ofi
-0.13
POSITIVE LOGITS
nor
1.13
Nor
0.96
nor
0.88
Nor
0.85
NOR
0.65
sondern
0.53
neither
0.47
anymore
0.45
بÙĦÚ©Ùĩ
0.44
ноÑĢ
0.39
Activations Density 0.273%