INDEX
Explanations
The neuron lights up on action‐oriented instruction words—particularly verbs that direct steps in a procedural or advisory context.
New Auto-Interp
Negative Logits
_hw
-0.07
naire
-0.07
tabla
-0.07
whe
-0.07
_marker
-0.06
maxWidth
-0.06
tak
-0.06
ungal
-0.06
ε
-0.06
_magic
-0.06
POSITIVE LOGITS
.What
0.07
源
0.06
anom
0.06
слишком
0.06
ustom
0.06
Awards
0.06
.Man
0.06
"','
0.06
�
0.06
شف
0.06
Activations Density 0.235%