INDEX
    Explanations

    The neuron activates on subword pieces that mark list or enumeration formatting (e.g. numbered bullets, asterisks, item‐start tokens).

    New Auto-Interp
    Negative Logits
     cellar
    -0.07
    (in
    -0.06
     barley
    -0.06
    Split
    -0.06
    /design
    -0.06
     Hernandez
    -0.06
     bosses
    -0.06
    ervations
    -0.06
     cousin
    -0.06
     liability
    -0.06
    POSITIVE LOGITS
    0.08
    addListener
    0.07
    .findAll
    0.06
    sizlik
    0.06
    ="'.
    0.06
    orWhere
    0.06
    ości
    0.06
    uming
    0.06
    енными
    0.06
    (!$
    0.06
    Act Density 0.020%

    No Known Activations