INDEX
Explanations
The neuron fires on tokens that signal step‐by‐step mathematical reasoning—words like “First,” “Next,” “Therefore,” “solve,” “simplify,” and the numerical expressions used when working through a solution.
New Auto-Interp
Negative Logits
اجر
-0.07
boğ
-0.06
\/\/
-0.06
玄
-0.06
щее
-0.06
Аб
-0.06
Storage
-0.06
小时
-0.06
tasting
-0.06
_DU
-0.06
POSITIVE LOGITS
itimate
0.07
vents
0.06
eller
0.06
minate
0.06
.','
0.06
cre
0.06
sher
0.06
cesso
0.06
äd
0.05
люб
0.05
Activations Density 0.019%