INDEX
Explanations
phrases beginning with the word "start."
New Auto-Interp
Negative Logits
stime
-0.17
borg
-0.16
åłĤ
-0.15
_:*
-0.14
iner
-0.14
wer
-0.14
Shorts
-0.14
åΰåºķ
-0.13
stal
-0.13
fg
-0.13
POSITIVE LOGITS
slow
0.20
innoc
0.19
simples
0.18
simple
0.18
-simple
0.18
small
0.17
simple
0.16
ç®Ģåįķ
0.16
slow
0.16
einfach
0.16
Activations Density 0.051%