INDEX
Explanations
references to a specific person named "Her" within a context of various activities and places
references to a specific individual named "Her."
New Auto-Interp
Negative Logits
eering
-0.77
————
-0.75
ozy
-0.71
––
-0.68
eers
-0.65
±
-0.64
—-
-0.62
=-=-=-=-=-=-=-=-
-0.62
TRANS
-0.62
ioxide
-0.61
POSITIVE LOGITS
itage
1.36
metic
1.07
acl
1.02
self
1.00
Majesty
0.99
itability
0.93
ding
0.93
own
0.92
rera
0.91
bage
0.89
Activations Density 0.019%