INDEX
Explanations
pronouns indicating possession or association
New Auto-Interp
Negative Logits
aris
-0.17
way
-0.16
iller
-0.15
405
-0.14
paged
-0.14
же
-0.14
airo
-0.13
ly
-0.13
ailer
-0.13
oux
-0.13
POSITIVE LOGITS
sorts
0.17
iej
0.15
-course
0.15
estre
0.14
curity
0.14
loe
0.14
ãĥ«ãĤ¯
0.14
APE
0.13
HEN
0.13
iaux
0.13
Activations Density 0.137%