INDEX
Explanations
terms related to stopping or halting actions
New Auto-Interp
Negative Logits
/loose
-0.15
Dün
-0.13
Evet
-0.13
ledik
-0.13
inv
-0.13
ervers
-0.13
ullan
-0.12
ague
-0.12
/MPL
-0.12
esome
-0.12
POSITIVE LOGITS
stop
0.87
stops
0.77
stopped
0.77
STOP
0.77
Stop
0.77
-stop
0.76
stopping
0.74
stop
0.73
halt
0.71
Stop
0.71
Activations Density 0.218%