INDEX
    Explanations

    Language or not speaking English

    The neuron activates on phrases where the assistant acknowledges or switches into a particular human language (e.g. “em português,” “en español,” “auf Deutsch,” “en français”).

    New Auto-Interp
    Negative Logits
    /md
    -0.07
    .email
    -0.06
     salad
    -0.06
    _REFRESH
    -0.06
    -0.06
     Mars
    -0.06
     fled
    -0.06
    ектив
    -0.06
     bindActionCreators
    -0.06
     üzerinde
    -0.06
    POSITIVE LOGITS
    xm
    0.07
    ourke
    0.06
    _Em
    0.06
    áv
    0.06
     reasons
    0.06
     [↵↵
    0.06
    llum
    0.06
    riteln
    0.06
    hya
    0.06
    imiter
    0.06
    Act Density 0.037%

    No Known Activations