INDEX
    Explanations

    The neuron is primarily triggered by uppercase abbreviations and acronyms (multi‐letter all-caps tokens).

    New Auto-Interp
    Negative Logits
    ing
    -0.07
    -0.07
    —even
    -0.07
    -0.07
    也不
    -0.06
     narrowed
    -0.06
    not
    -0.06
     від
    -0.06
    ้จ
    -0.06
     well
    -0.06
    POSITIVE LOGITS
    a
    0.21
    ra
    0.18
    ula
    0.17
    ka
    0.17
    ga
    0.16
    ha
    0.16
    RA
    0.16
    la
    0.16
    A
    0.16
    pa
    0.16
    Act Density 0.779%

    No Known Activations