INDEX
Explanations
possessive pronouns that refer to ownership or belonging
New Auto-Interp
Negative Logits
getti
-0.20
arget
-0.15
Qin
-0.15
ery
-0.14
oleÄį
-0.13
unate
-0.13
abin
-0.13
acc
-0.13
uar
-0.13
omain
-0.13
POSITIVE LOGITS
ogui
0.16
nep
0.15
enia
0.14
adia
0.14
odon
0.14
_allow
0.14
ìĺģ
0.14
ESS
0.14
幸
0.13
osci
0.13
Activations Density 0.071%