INDEX
    Explanations

    multiple languages

    The neuron fires on specialized technical terminology—particularly NLP/linguistics jargon—rather than ordinary words.

    New Auto-Interp
    Negative Logits
    ิหาร
    -0.06
     SWT
    -0.06
     Pixar
    -0.06
    load
    -0.06
    rif
    -0.06
    -0.06
     pulls
    -0.06
    -0.06
    _Il
    -0.06
    ICIENT
    -0.06
    POSITIVE LOGITS
     Successful
    0.07
     compulsory
    0.06
    (elem
    0.06
    ograd
    0.06
    (errorMessage
    0.06
     usuarios
    0.06
     endorsement
    0.06
    (panel
    0.06
    cznie
    0.06
     fraudulent
    0.06
    Act Density 0.123%

    No Known Activations