INDEX
Explanations
instances of the word "stop" in various forms
New Auto-Interp
Negative Logits
uju
-0.18
cul
-0.17
xaa
-0.17
kv
-0.17
_NOP
-0.15
ñana
-0.15
eel
-0.15
fter
-0.15
undles
-0.14
estar
-0.14
POSITIVE LOGITS
533
0.16
over
0.16
ub
0.15
285
0.15
090
0.15
653
0.15
insk
0.15
stop
0.14
NCY
0.14
375
0.14
Activations Density 0.028%