INDEX
    Explanations

    code snippets

    The neuron activates on placeholder name tokens (e.g., “NAME_#”), i.e. it detects instances of anonymized person-name markers.

    New Auto-Interp
    Negative Logits
     wl
    -0.07
     veil
    -0.06
    llib
    -0.06
    -0.06
     Dış
    -0.06
    -0.06
     dado
    -0.06
     grads
    -0.06
    _house
    -0.06
     ніби
    -0.06
    POSITIVE LOGITS
     coroutine
    0.07
    ANG
    0.07
    іка
    0.06
    391
    0.06
     zvyš
    0.06
     Neil
    0.06
     HIS
    0.06
     ráp
    0.06
    0.06
    سام
    0.06
    Act Density 1.175%

    No Known Activations