INDEX
    Explanations

    code/programming

    This neuron activates on the “y” string—specifically on the word “concerning” and the following gerund/verb tokens describing harmful or malicious actions.

    New Auto-Interp
    Negative Logits
    FIT
    -0.06
    _filt
    -0.06
    027
    -0.06
    isEnabled
    -0.06
    prt
    -0.06
    唯一
    -0.06
    -0.06
    passes
    -0.06
     senses
    -0.06
    ceipt
    -0.06
    POSITIVE LOGITS
    0.07
    978
    0.06
     simultaneously
    0.06
     адміністра
    0.06
     bunların
    0.06
     Residential
    0.06
     Alg
    0.06
     ########.
    0.06
     stirring
    0.06
    ського
    0.06
    Act Density 0.006%

    No Known Activations