INDEX
Explanations
possessive pronouns related to personal experiences or belongings
New Auto-Interp
Negative Logits
rů
-0.15
jedn
-0.15
ories
-0.15
yourselves
-0.14
Ë
-0.14
mi
-0.14
themselves
-0.14
ź
-0.14
unanim
-0.14
s
-0.13
POSITIVE LOGITS
myself
0.36
riad
0.32
rtle
0.31
opic
0.29
opia
0.29
anmar
0.29
ri
0.26
SELF
0.26
embros
0.24
/my
0.23
Activations Density 0.133%