INDEX
Explanations
possessive pronouns indicating ownership or association
New Auto-Interp
Negative Logits
aroo
-0.16
.cbo
-0.14
eg
-0.14
osg
-0.14
sibling
-0.14
egin
-0.14
ises
-0.13
ichick
-0.13
zet
-0.13
ington
-0.13
POSITIVE LOGITS
own
0.28
/her
0.20
self
0.18
SELF
0.18
próp
0.18
Own
0.18
panic
0.16
zelf
0.16
/us
0.16
Own
0.15
Activations Density 0.995%