INDEX
    Explanations

    The neuron fires on occurrences of language names (e.g. “Portuguese,” “Spanish,” “French,” “Vietnamese,” “Russian”).

    New Auto-Interp
    Negative Logits
     Raf
    -0.07
    latin
    -0.07
    aza
    -0.06
    seudo
    -0.06
    fw
    -0.06
     Platz
    -0.06
     flats
    -0.06
    rats
    -0.06
    ason
    -0.06
    цит
    -0.06
    POSITIVE LOGITS
     conveying
    0.07
     заяви
    0.06
    0.06
     엄마
    0.06
     SAN
    0.06
     sausage
    0.06
    країн
    0.06
     couleur
    0.06
    ICIENT
    0.06
    .me
    0.06
    Act Density 0.031%

    No Known Activations