INDEX
Explanations
words associated with nationalities and geographical entities
New Auto-Interp
Negative Logits
mits
-0.15
.sax
-0.14
hetto
-0.14
ets
-0.14
pedia
-0.13
Asia
-0.13
illez
-0.13
leitung
-0.13
ion
-0.13
Asia
-0.13
POSITIVE LOGITS
apanese
0.19
wegian
0.19
icense
0.18
merican
0.16
istani
0.16
arian
0.16
Canadian
0.16
namese
0.15
american
0.15
American
0.15
Activations Density 0.081%