INDEX
Explanations
possessive pronouns referring to the speaker's experiences or belongings
New Auto-Interp
Negative Logits
ary
-0.15
marks
-0.15
yourselves
-0.14
mark
-0.14
light
-0.14
ict
-0.14
hower
-0.14
markt
-0.13
tails
-0.13
gether
-0.13
POSITIVE LOGITS
rtle
0.27
SELF
0.24
own
0.23
/her
0.23
zelf
0.22
opia
0.21
self
0.21
opic
0.19
/us
0.19
anmar
0.19
Activations Density 0.128%