INDEX
Explanations
contextually significant phrases related to observations, experiences, and nuanced expressions of thought
New Auto-Interp
Negative Logits
zwiſchen
-0.76
ſelben
-0.72
ſammen
-0.71
︐
-0.71
unſer
-0.71
ſelbſt
-0.71
tartalo
-0.70
erſten
-0.69
<pad>
-0.69
[@BOS@]
-0.69
POSITIVE LOGITS
normally
0.31
pyx
0.28
mi
0.28
setWindow
0.28
gambe
0.28
MD
0.27
↵↵
0.26
ProtoMessage
0.26
!
0.26
ni
0.26
Activations Density 0.110%