INDEX
Explanations
pronouns and personal references
New Auto-Interp
Negative Logits
ather
-0.15
Favor
-0.15
conditions
-0.14
ahlen
-0.14
Gre
-0.14
xuyên
-0.14
(rank
-0.14
ÅĪ
-0.14
ANTA
-0.14
Rank
-0.13
POSITIVE LOGITS
claim
0.20
Claim
0.20
claiming
0.20
Claim
0.19
CLAIM
0.17
claim
0.17
CLAIM
0.17
Ñĥка
0.17
æĸŃ
0.16
_claim
0.16
Activations Density 0.003%