INDEX
Explanations
references to political parties and election-related terminology
New Auto-Interp
Negative Logits
fran
-0.17
iÄĩ
-0.15
chwitz
-0.15
.TXT
-0.14
Zus
-0.14
arov
-0.14
pedia
-0.14
TAR
-0.14
Bush
-0.13
OTT
-0.13
POSITIVE LOGITS
ayo
0.15
ius
0.15
ÙĪØ§Ø¬
0.15
bum
0.14
bru
0.14
ãĥ
0.13
برÛĮ
0.13
aya
0.13
елем
0.13
abi
0.13
Activations Density 0.091%