INDEX
    Explanations

    This neuron detects direct second-person address, activating on tokens like “you,” “your,” and references to the human interlocutor.

    New Auto-Interp
    Negative Logits
    报道
    -0.07
     airline
    -0.07
    uraa
    -0.06
     اتاق
    -0.06
     cow
    -0.06
     fon
    -0.06
    جاد
    -0.06
     contacto
    -0.06
    agascar
    -0.06
    dio
    -0.06
    POSITIVE LOGITS
     wrestlers
    0.06
     буд
    0.06
     Purch
    0.06
     Sovere
    0.06
     خصوص
    0.06
     Emoji
    0.06
    .cc
    0.06
    .ADMIN
    0.06
    Mui
    0.06
    ้ง
    0.06
    Act Density 0.002%

    No Known Activations