INDEX
Explanations
calculations
This neuron activates on the start‐of‐header marker and on instructional/explanatory wording—i.e. phrases that introduce or structure step‐by‐step calculations.
New Auto-Interp
Negative Logits
Functional
-0.07
(utf
-0.07
_DOMAIN
-0.07
Lady
-0.06
XYZ
-0.06
orchestr
-0.06
Portrait
-0.06
Dog
-0.06
ียรต
-0.06
erra
-0.06
POSITIVE LOGITS
jeta
0.07
acceptable
0.07
заліз
0.06
.getLog
0.06
cence
0.06
lấy
0.06
lately
0.06
carb
0.06
çab
0.06
marvel
0.06
Activations Density 0.055%