INDEX
    Explanations

    calculations

    This neuron activates on the start‐of‐header marker and on instructional/explanatory wording—i.e. phrases that introduce or structure step‐by‐step calculations.

    New Auto-Interp
    Negative Logits
    Functional
    -0.07
    (utf
    -0.07
    _DOMAIN
    -0.07
    Lady
    -0.06
    XYZ
    -0.06
     orchestr
    -0.06
    Portrait
    -0.06
    Dog
    -0.06
    ียรต
    -0.06
    erra
    -0.06
    POSITIVE LOGITS
    jeta
    0.07
     acceptable
    0.07
     заліз
    0.06
    .getLog
    0.06
    cence
    0.06
     lấy
    0.06
     lately
    0.06
     carb
    0.06
     çab
    0.06
     marvel
    0.06
    Act Density 0.055%

    No Known Activations