INDEX
Explanations
references to national or ethnic identities related to individuals
New Auto-Interp
Negative Logits
oku
-0.17
anela
-0.15
£¼
-0.15
vinc
-0.14
358
-0.14
ialect
-0.14
elin
-0.14
utrecht
-0.14
ocs
-0.14
amera
-0.14
POSITIVE LOGITS
alive
0.15
æ¶
0.14
yasal
0.14
NotAllowed
0.14
//===
0.14
æĻ¯
0.14
VERR
0.14
Lump
0.14
blindness
0.14
th
0.13
Activations Density 0.024%