INDEX
Explanations
mentions of a person named "Her"
references to a specific individual or character named "Her"
New Auto-Interp
Negative Logits
————
-0.77
ype
-0.76
ozy
-0.74
ypes
-0.74
inctions
-0.69
eering
-0.69
ornia
-0.69
anamo
-0.69
yip
-0.67
VERTIS
-0.65
POSITIVE LOGITS
itage
1.44
Majesty
1.36
metic
1.15
ding
1.10
itability
1.05
acl
1.05
cule
1.04
self
0.97
mits
0.97
mit
0.96
Activations Density 0.064%