INDEX
Explanations
pronouns and their associated verbs, reflecting personal perspectives and interactions
New Auto-Interp
Negative Logits
571
-0.18
entar
-0.17
ixin
-0.16
claimer
-0.15
onta
-0.15
outh
-0.15
erve
-0.15
iko
-0.14
athing
-0.14
оÑĢом
-0.14
POSITIVE LOGITS
pmat
0.15
ISIBLE
0.15
wonder
0.15
Cheng
0.15
ä¼ı
0.14
okino
0.14
WINAPI
0.14
adr
0.14
aleigh
0.14
ihanna
0.14
Activations Density 0.303%