INDEX
Explanations
names that are related to cultural or ethnic identities
New Auto-Interp
Negative Logits
uty
-0.15
itecture
-0.14
istrovstvÃŃ
-0.14
íĮĮ
-0.14
rez
-0.14
ulet
-0.14
umnos
-0.14
_CID
-0.14
es
-0.14
getter
-0.14
POSITIVE LOGITS
ellaneous
0.27
rael
0.24
antro
0.20
consin
0.18
abella
0.18
patrick
0.17
beth
0.17
lere
0.16
apore
0.16
ahkan
0.16
Activations Density 0.086%