INDEX
Explanations
articles and determiners, particularly variations of "the."
New Auto-Interp
Negative Logits
izer
-0.17
ani
-0.17
ader
-0.16
anni
-0.15
ives
-0.15
iser
-0.15
agn
-0.15
kovi
-0.15
Hart
-0.14
aly
-0.14
POSITIVE LOGITS
ãĥIJãĤ¤
0.15
utta
0.15
utsch
0.15
üstü
0.14
/loose
0.14
gons
0.14
Už
0.14
:init
0.14
richt
0.14
CAUSED
0.14
Activations Density 0.050%