INDEX
Explanations
mentions of the word "her" in various contexts
New Auto-Interp
Negative Logits
ra
-0.17
erc
-0.17
ers
-0.15
omed
-0.15
urn
-0.15
ela
-0.15
resse
-0.15
sword
-0.15
sd
-0.15
s
-0.15
POSITIVE LOGITS
bst
0.22
itage
0.20
editary
0.19
Majesty
0.18
OwnProperty
0.17
usalem
0.17
vey
0.17
bage
0.17
encia
0.16
çĸ
0.16
Activations Density 0.025%