INDEX
Explanations
possessive pronouns and their references
New Auto-Interp
Negative Logits
hiba
-0.17
çļĦ大
-0.16
ãĤĵãģ©
-0.15
persons
-0.14
ingly
-0.14
ndef
-0.14
çļĦæīĭ
-0.14
Rarity
-0.14
azers
-0.14
luv
-0.14
POSITIVE LOGITS
/her
0.48
panic
0.34
/she
0.33
sing
0.29
idi
0.24
himself
0.23
pter
0.20
zelf
0.20
avier
0.20
Majesty
0.20
Activations Density 0.227%