INDEX
Explanations
This neuron detects the presence of instruction‐style or template tokens in the prompt (e.g. header words like “Your,” “should,” “Here,” and numeric placeholders).
New Auto-Interp
Negative Logits
phases
-0.08
cts
-0.07
orarily
-0.07
:m
-0.07
timeout
-0.07
consecutive
-0.07
.tabs
-0.06
:(
-0.06
aspects
-0.06
облад
-0.06
POSITIVE LOGITS
Venom
0.07
夫人
0.06
EIF
0.06
insufficient
0.06
_RECV
0.06
uten
0.06
vern
0.06
erculosis
0.06
milf
0.06
어나
0.06
Activations Density 0.008%