INDEX
Explanations
references to specific ethnic or cultural groups and their characteristics
New Auto-Interp
Negative Logits
Heller
-0.19
PELL
-0.16
atz
-0.15
ãĥ¼ãĥ³
-0.15
orrow
-0.14
UDGE
-0.14
mailer
-0.14
rá
-0.14
UNK
-0.13
evi
-0.13
POSITIVE LOGITS
men
0.26
ic
0.24
Turk
0.16
meni
0.16
μεν
0.16
emen
0.16
ican
0.15
oman
0.15
Amen
0.15
iband
0.15
Activations Density 0.008%