INDEX
Explanations
mentions of "Mrs." followed by a name or title, indicating references to women or female characters
New Auto-Interp
Negative Logits
orse
-0.15
orses
-0.15
ante
-0.15
ahn
-0.14
NSK
-0.14
sám
-0.14
vy
-0.14
stret
-0.13
adiens
-0.13
ons
-0.13
POSITIVE LOGITS
Doub
0.20
Universe
0.19
Claus
0.18
Grund
0.18
enschaft
0.17
Whats
0.17
liable
0.17
doub
0.17
seau
0.17
elijke
0.16
Activations Density 0.040%