INDEX
Explanations
occurrences of gender-specific nouns and verbs related to criminal behavior
New Auto-Interp
Negative Logits
anni
-0.16
arel
-0.15
ç¿
-0.15
388
-0.14
058
-0.14
achs
-0.14
utsche
-0.14
BÃłi
-0.14
èij
-0.13
दर
-0.13
POSITIVE LOGITS
кав
0.14
apore
0.14
oron
0.14
çķª
0.14
erves
0.14
.ix
0.14
[assembly
0.14
ãģ£ãģ¡
0.14
ahren
0.14
rana
0.13
Activations Density 0.195%