INDEX
Explanations
words and phrases that convey comparisons and evaluations of individuals or groups, particularly in relation to social or personal attributes
New Auto-Interp
Negative Logits
civilian
-0.21
civilians
-0.18
vrouw
-0.16
frauen
-0.16
vrouwen
-0.15
vero
-0.15
ufen
-0.15
_UNICODE
-0.15
Frauen
-0.15
women
-0.15
POSITIVE LOGITS
ä¸Ī夫
0.20
males
0.20
éĽĦ
0.20
masculine
0.20
male
0.20
Ñĩолов
0.18
male
0.17
ová
0.17
à¸Ļาย
0.16
Husband
0.16
Activations Density 1.699%