INDEX
Explanations
the first-person singular pronoun and related possessive forms
New Auto-Interp
Negative Logits
ynos
-0.17
abble
-0.17
**/↵↵
-0.15
illance
-0.15
ovol
-0.15
asonry
-0.15
oust
-0.15
loid
-0.14
me
-0.14
onas
-0.14
POSITIVE LOGITS
/or
0.26
nbsp
0.23
amp
0.21
apos
0.19
bull
0.19
bullet
0.18
roz
0.16
kate
0.16
rea
0.16
æĬ¼
0.16
Activations Density 0.198%