INDEX
Explanations
references to men, their roles, and their representation within various contexts
New Auto-Interp
Negative Logits
oÄŁlu
-0.15
isoft
-0.15
Boy
-0.14
EEK
-0.14
.Aggressive
-0.14
LING
-0.13
Brother
-0.13
iblings
-0.13
lang
-0.13
ICO
-0.13
POSITIVE LOGITS
women
0.86
woman
0.77
women
0.69
Women
0.65
Women
0.62
WOM
0.59
Woman
0.59
女人
0.57
woman
0.57
ladies
0.54
Activations Density 0.347%