INDEX
Explanations
This neuron detects question-and command-starter tokens in math problems—words like “What,” “Divide,” “Calculate,” “Work,” “Round,” “Convert,” and similar prompt verbs.
New Auto-Interp
Negative Logits
opus
-0.07
(!(
-0.06
шем
-0.06
ी।↵
-0.06
short
-0.06
็นว
-0.06
-0.06
embroidery
-0.06
typings
-0.06
ському
-0.06
POSITIVE LOGITS
(__('0.06
=back
0.06
cita
0.06
edral
0.06
ry
0.06
boo
0.06
hac
0.06
quartered
0.06
FillColor
0.06
있을
0.06
Activations Density 0.013%