INDEX
Explanations
references to male figures or roles in various contexts
New Auto-Interp
Negative Logits
.}}
-0.78
Theſe
-0.77
^(@)
-0.74
"]="
-0.72
expandindo
-0.71
']")
-0.70
%;">
-0.70
beginnetje
-0.69
)._
-0.69
itſelf
-0.68
POSITIVE LOGITS
Men
1.01
men
0.95
Man
0.84
Men
0.84
MEN
0.80
man
0.80
who
0.80
Man
0.70
interopRequire
0.67
mans
0.66
Activations Density 0.100%