INDEX
Explanations
possessive pronouns indicating ownership or relation
New Auto-Interp
Negative Logits
owo
-0.20
inus
-0.17
ERGE
-0.15
stk
-0.15
_RENDERER
-0.15
chặt
-0.15
onde
-0.15
eva
-0.14
ofi
-0.14
tg
-0.14
POSITIVE LOGITS
quest
0.17
entirety
0.16
croft
0.16
own
0.16
spare
0.15
favor
0.15
227
0.15
teens
0.14
문
0.14
absence
0.14
Activations Density 0.095%