INDEX
Explanations
references to the pronoun "she"
New Auto-Interp
Negative Logits
kefeller
-0.84
antage
-0.75
emetery
-0.72
odder
-0.71
vernment
-0.69
Observatory
-0.68
PDATE
-0.66
Skydragon
-0.65
undo
-0.65
atory
-0.65
POSITIVE LOGITS
herself
1.50
pher
1.43
athed
1.27
athing
1.23
pard
1.21
pherd
1.11
ffield
1.10
ikh
0.99
ppard
0.98
lled
0.98
Activations Density 0.114%