INDEX
Explanations
pronouns that indicate possession or ownership
New Auto-Interp
Negative Logits
thing
-0.16
onical
-0.16
aroo
-0.15
atti
-0.14
uur
-0.13
ostel
-0.13
gang
-0.13
cs
-0.13
ington
-0.12
ises
-0.12
POSITIVE LOGITS
own
0.26
/her
0.22
zelf
0.21
self
0.20
próp
0.19
Own
0.16
Own
0.16
_own
0.16
SELF
0.16
utta
0.15
Activations Density 0.960%