INDEX
Explanations
references to countries and their geographical attributes or political situations
New Auto-Interp
Negative Logits
nev
-0.14
smith
-0.14
.soft
-0.14
æĬ
-0.13
951
-0.13
šk
-0.13
auge
-0.13
otr
-0.13
kup
-0.13
_Native
-0.13
POSITIVE LOGITS
ese
0.23
istan
0.21
åħ±åĴĮåĽ½
0.20
ian
0.18
-born
0.18
Republic
0.18
ischer
0.17
ç±į
0.17
erland
0.17
national
0.17
Activations Density 0.121%