INDEX
Explanations
This neuron activates on numeric tokens specifying the requested word‐count in the prompt (e.g. the “200 0 words” figure).
New Auto-Interp
Negative Logits
verified
-0.08
centerY
-0.07
ots
-0.06
von
-0.06
Ness
-0.06
questionable
-0.06
Bella
-0.06
.EOF
-0.06
eligible
-0.06
differentiation
-0.06
POSITIVE LOGITS
“你
0.06
ourse
0.06
(proj
0.06
ustanov
0.06
firstname
0.06
paypal
0.06
ilmington
0.06
домов
0.06
Usuario
0.06
emploi
0.06
Activations Density 0.006%