INDEX
Explanations
phrases indicating progression or escalation in actions or ideas
New Auto-Interp
Negative Logits
ác
-0.17
antt
-0.17
loor
-0.16
_BS
-0.16
EINVAL
-0.15
ollah
-0.14
æ°ĹãģĮ
-0.14
achelor
-0.14
гÑĢо
-0.14
egin
-0.14
POSITIVE LOGITS
further
0.39
Further
0.30
Further
0.27
è¿Ľä¸ĢæŃ¥
0.25
step
0.24
far
0.21
farther
0.21
beyond
0.19
ä¸ĢæŃ¥
0.18
steps
0.18
Activations Density 0.028%