INDEX
Explanations
references to gender-specific pronouns, particularly "she" and "her"
New Auto-Interp
Negative Logits
Joy
-0.77
Rusty
-0.71
CLR
-0.70
Vil
-0.69
Vaugh
-0.66
Airl
-0.65
Vive
-0.64
COUR
-0.64
Orders
-0.63
Settlement
-0.63
POSITIVE LOGITS
self
1.04
selves
0.80
own
0.72
athed
0.71
selves
0.70
acht
0.70
elf
0.69
needs
0.69
affer
0.67
itage
0.66
Activations Density 0.018%