INDEX
    Explanations

    references to being thankful or showing gratitude

    New Auto-Interp
    Negative Logits
    ihar
    -0.78
    ilib
    -0.77
    ires
    -0.77
    urden
    -0.75
    apest
    -0.75
    emies
    -0.75
    icut
    -0.73
    terness
    -0.73
    ardless
    -0.72
    ecycle
    -0.72
    POSITIVE LOGITS
     generous
    1.26
     clever
    1.03
     generosity
    1.01
     diligent
    1.01
     ingenuity
    0.99
     advancements
    0.96
     donations
    0.94
     ingenious
    0.94
     persever
    0.93
     hindsight
    0.93
    Act Density 0.241%

    No Known Activations