INDEX
Explanations
mentions of male and female titles or honorifics
New Auto-Interp
Negative Logits
20439
-0.78
actionGroup
-0.77
appre
-0.75
tremend
-0.72
exting
-0.70
wheelchair
-0.69
eleph
-0.68
pione
-0.67
rawdownloadcloneembedreportprint
-0.63
catentry
-0.63
POSITIVE LOGITS
.,
1.07
./
0.89
.,"
0.83
.;
0.81
.?
0.81
Blasio
0.76
iggins
0.75
.-
0.73
.),
0.70
izer
0.70
Activations Density 0.022%