INDEX
Explanations
actions related to stopping or pausing activities
New Auto-Interp
Negative Logits
ÃŃc
-0.16
opoulos
-0.15
imdi
-0.15
vala
-0.14
zp
-0.14
cke
-0.14
zik
-0.14
anel
-0.14
cent
-0.14
ceph
-0.14
POSITIVE LOGITS
STOP
0.19
stops
0.19
stopped
0.18
-stop
0.18
stop
0.18
711
0.17
STOP
0.17
stop
0.17
Stop
0.16
stopped
0.16
Activations Density 0.042%