INDEX
    Explanations

    This neuron activates on the special token marking the assistant speaker role indicator.

    New Auto-Interp
    Negative Logits
    ]↵
    -0.06
     Radi
    -0.06
     radi
    -0.06
    ")]↵
    -0.06
     حمل
    -0.06
    یط
    -0.05
    /usr
    -0.05
     SON
    -0.05
     IAM
    -0.05
    !"↵
    -0.05
    POSITIVE LOGITS
    ộn
    0.07
    ْع
    0.07
    Advice
    0.07
    이어
    0.07
    _converter
    0.07
    Species
    0.07
    TASK
    0.07
     moyen
    0.07
    ưỡng
    0.07
     resembled
    0.06
    Act Density 0.061%

    No Known Activations