INDEX
Explanations
phrases related to people's names or titles
instances of a specific entity or name associated with "Her."
New Auto-Interp
Negative Logits
ypes
-0.74
————
-0.73
eering
-0.70
ozy
-0.67
ype
-0.65
Strauss
-0.64
eers
-0.62
éĹĺ
-0.62
govtrack
-0.62
Gutenberg
-0.62
POSITIVE LOGITS
itage
1.55
Majesty
1.33
metic
1.21
ding
1.18
acl
1.17
mit
1.14
cule
1.13
itability
1.12
mits
1.07
bal
1.03
Activations Density 0.041%