INDEX
    Explanations

    happy greetings and phrases

    The neuron primarily detects the token "happy" (and close variants/uses of "Happy") — i.e., expressions of happiness/positive greetings.

    New Auto-Interp
    Negative Logits
    K
    0.85
    F
    0.76
    W
    0.76
    G
    0.74
    0.74
    E
    0.72
    R
    0.70
    M
    0.69
    ف
    0.69
    H
    0.68
    POSITIVE LOGITS
     happy
    0.83
     feliz
    0.79
     Happy
    0.75
     mutlu
    0.66
     felices
    0.61
     felicidad
    0.60
    happy
    0.60
     happier
    0.60
     bahagia
    0.59
     HAPPY
    0.57
    Act Density 0.021%

    No Known Activations