INDEX
Explanations
instances of specific suffixes or endings in words
New Auto-Interp
Negative Logits
tot
-0.18
Ñı
-0.17
tel
-0.16
ept
-0.16
orld
-0.16
éro
-0.16
tep
-0.16
tega
-0.16
letcher
-0.16
SSION
-0.15
POSITIVE LOGITS
ey
0.19
presso
0.18
oteric
0.17
ophage
0.17
itated
0.17
itating
0.17
apeake
0.17
pecially
0.17
ellschaft
0.16
eker
0.16
Activations Density 0.032%