INDEX
    Explanations

    The neuron strongly activates on isolated single-letter tokens (e.g. standalone variables or letters).

    New Auto-Interp
    Negative Logits
    restaurant
    -0.08
     مط
    -0.07
    SQ
    -0.07
     comida
    -0.07
    istinguished
    -0.06
    /km
    -0.06
    アル
    -0.06
    ucch
    -0.06
     narr
    -0.06
    /disc
    -0.06
    POSITIVE LOGITS
    اگر
    0.06
    -selection
    0.06
    _destroy
    0.06
     español
    0.06
     steer
    0.06
    an
    0.06
     spielen
    0.06
    +↵
    0.06
     elective
    0.06
    '])↵
    0.06
    Act Density 0.047%

    No Known Activations