INDEX
    Explanations

    expressions of gratitude and kindness

    expressions of gratitude and kindness

    New Auto-Interp
    Negative Logits
     contested
    -0.72
    scene
    -0.69
     competing
    -0.68
     SUP
    -0.68
     charged
    -0.67
    NC
    -0.65
    wang
    -0.64
    viks
    -0.64
    merce
    -0.64
    uter
    -0.64
    POSITIVE LOGITS
     gratitude
    2.32
     kindness
    2.28
     generosity
    2.20
     curiosity
    2.17
     humility
    2.11
     empathy
    1.95
     cynicism
    1.91
     optimism
    1.88
     honesty
    1.85
     arrogance
    1.84
    Act Density 0.051%

    No Known Activations