INDEX
Explanations
possessive pronouns indicating ownership or association
New Auto-Interp
Negative Logits
ington
-0.15
atti
-0.15
ness
-0.13
(
-0.13
ict
-0.13
ãĤ¢ãĥ«
-0.13
çļĦ大
-0.13
possibility
-0.13
illery
-0.13
itive
-0.12
POSITIVE LOGITS
own
0.39
/her
0.29
SELF
0.25
próp
0.25
Own
0.23
Own
0.23
zelf
0.22
own
0.21
self
0.20
OWN
0.20
Activations Density 0.950%