INDEX
Explanations
references to individuals' political roles and affiliations
New Auto-Interp
Negative Logits
ÑĤÑĢо
-0.16
kest
-0.14
visa
-0.14
ãĥ©ãĥĥãĤ¯
-0.14
-visible
-0.14
ights
-0.13
Bounty
-0.13
плаÑĤ
-0.13
↵↵
-0.13
owell
-0.13
POSITIVE LOGITS
Chef
0.20
Refer
0.20
stell
0.20
Le
0.20
Dire
0.19
Che
0.19
Sek
0.18
refer
0.18
chef
0.18
che
0.17
Activations Density 0.019%