INDEX
Explanations
references to countries and international organizations
New Auto-Interp
Negative Logits
atik
-0.16
acon
-0.15
lington
-0.14
inand
-0.14
olumn
-0.14
erton
-0.13
Tot
-0.13
rie
-0.13
Tot
-0.13
aber
-0.13
POSITIVE LOGITS
елен
0.20
aille
0.15
uez
0.15
ixe
0.15
etiyle
0.15
hlen
0.14
Invoke
0.14
eca
0.14
رÙĥ
0.14
ailles
0.14
Activations Density 0.016%