INDEX
Explanations
This neuron activates on tokens in parenthetical length‐limit instructions—especially on the “no more than X words” framing (e.g., the “(no … words)” constraint).
New Auto-Interp
Negative Logits
학기
-0.06
icing
-0.06
.vert
-0.06
shown
-0.06
сентября
-0.06
Hola
-0.06
Anthony
-0.06
826
-0.06
dealt
-0.06
ilde
-0.05
POSITIVE LOGITS
ledged
0.07
Applied
0.07
UTDOWN
0.07
작업
0.07
pes
0.07
refresh
0.06
Completed
0.06
(domain
0.06
chio
0.06
ğu
0.06
Activations Density 0.021%