INDEX
Explanations
references to specific countries or nationalities
New Auto-Interp
Negative Logits
velt
-0.15
bersome
-0.15
Ïĥμα
-0.15
lech
-0.15
ivalent
-0.14
701
-0.14
frei
-0.13
wner
-0.13
poons
-0.13
aÄŁa
-0.13
POSITIVE LOGITS
ian
0.20
can
0.18
ifornia
0.18
ican
0.17
-Russian
0.17
ÑģÑĤан
0.17
bian
0.17
-American
0.16
ish
0.16
589
0.16
Activations Density 0.182%