INDEX
Explanations
the presence of verbs and indications of actions performed
multilingual word endings
New Auto-Interp
Negative Logits
a
-0.29
-0.27
、
-0.24
y
-0.23
Sternen
-0.23
or
-0.23
no
-0.23
c
-0.22
and
-0.22
h
-0.21
POSITIVE LOGITS
zwiſchen
1.28
iſchen
1.28
[@BOS@]
1.27
<unused14>
1.27
<unused79>
1.27
niſſe
1.27
<unused74>
1.27
<unused43>
1.27
<unused28>
1.27
<unused41>
1.27
Activations Density 0.014%