INDEX
    Explanations

    interpersonal relationships

    emotional expressions and gestures in romantic contexts.

    The neuron is detecting speaker‐turn labels and character identifiers in the dialogue (e.g. tokens like NAME_1, NAME_2, and header/ID markers).

    New Auto-Interp
    Negative Logits
    trs
    -0.07
    ,j
    -0.07
     oggi
    -0.07
    ={}
    -0.07
    ВС
    -0.07
     برخ
    -0.06
    IVEN
    -0.06
    人の
    -0.06
    ốc
    -0.06
    orage
    -0.06
    POSITIVE LOGITS
     добавить
    0.07
    .Controls
    0.07
     ----------↵
    0.07
    0.06
    START
    0.06
     banning
    0.06
     noci
    0.06
     вещ
    0.06
    iropr
    0.06
     creds
    0.06
    Act Density 0.045%

    No Known Activations