INDEX
Explanations
references to countries and their corresponding codes or identifiers
New Auto-Interp
Negative Logits
irt
-0.16
unter
-0.15
mouth
-0.15
frei
-0.14
terdam
-0.14
estic
-0.14
sons
-0.14
zan
-0.14
zin
-0.14
illion
-0.14
POSITIVE LOGITS
uguay
0.18
anness
0.18
dụng
0.17
prising
0.17
ndef
0.16
844
0.15
VERRIDE
0.15
ustos
0.15
hone
0.15
esa
0.15
Activations Density 0.296%