INDEX
Explanations
references to possessive pronouns such as "their" and "own"
references to ownership or possession
New Auto-Interp
Negative Logits
vine
-0.95
Unsure
-0.86
Flan
-0.72
ozy
-0.69
leaf
-0.68
ctor
-0.68
bender
-0.67
uph
-0.67
epad
-0.67
atoon
-0.66
POSITIVE LOGITS
own
1.81
respective
1.63
selves
1.51
selves
1.35
counterparts
1.23
careers
1.22
minds
1.21
hearts
1.20
lives
1.19
OWN
1.14
Activations Density 0.219%