INDEX
Explanations
pronouns indicating gender
references to gender pronouns, particularly focusing on "he" and "she."
New Auto-Interp
Negative Logits
Mub
-0.70
Joy
-0.70
Vil
-0.69
Vive
-0.69
Bacon
-0.65
Delicious
-0.63
CLR
-0.61
Rusty
-0.61
Yar
-0.61
Downs
-0.60
POSITIVE LOGITS
self
0.96
own
0.80
selves
0.77
itage
0.73
issance
0.72
agon
0.70
spouse
0.69
acht
0.66
ynasty
0.66
gdala
0.66
Activations Density 0.034%