INDEX
Explanations
possessive pronouns and references to ownership
New Auto-Interp
Negative Logits
possibility
-0.14
atti
-0.14
(
-0.14
onical
-0.13
loquent
-0.13
brink
-0.12
ises
-0.12
thing
-0.12
Sibling
-0.12
igure
-0.12
POSITIVE LOGITS
own
0.31
/her
0.23
zelf
0.22
self
0.21
próp
0.21
rtle
0.21
SELF
0.19
_own
0.19
Own
0.18
Own
0.18
Activations Density 0.991%