INDEX
Explanations
mentions of male characters or pronouns
New Auto-Interp
Negative Logits
isay
-0.15
ossier
-0.15
472
-0.15
oss
-0.14
carrying
-0.14
318
-0.14
rab
-0.14
696
-0.13
dzi
-0.13
neph
-0.13
POSITIVE LOGITS
оÑĢоÑĤ
0.16
also
0.16
ONO
0.15
éģķ
0.15
IGH
0.15
meg
0.15
mere
0.14
Also
0.14
ALSO
0.14
ONTAL
0.14
Activations Density 0.303%