INDEX
Explanations
phrases indicating progression or advancement
New Auto-Interp
Negative Logits
kur
-0.17
æĿ¥èĩª
-0.16
rap
-0.16
оба
-0.15
elsen
-0.15
rape
-0.14
ussen
-0.14
ekim
-0.14
/linux
-0.14
loat
-0.14
POSITIVE LOGITS
raquo
0.15
fray
0.15
beyond
0.15
632
0.15
åŁ
0.14
ÙĪÙī
0.14
ellig
0.14
ê¸ī
0.14
ional
0.13
utting
0.13
Activations Density 0.088%