INDEX
    Explanations

    words related to rewards and recognition

    New Auto-Interp
    Negative Logits
    LookAnd
    -0.80
     Alvar
    -0.76
     Suzy
    -0.72
     Ganges
    -0.72
    httphttps
    -0.72
     Stalin
    -0.70
    Enders
    -0.70
     Jace
    -0.69
     Miscell
    -0.68
     ciga
    -0.68
    POSITIVE LOGITS
     rewards
    1.32
     Rewards
    1.22
     reward
    1.18
     rewarding
    1.18
    Reward
    1.12
     Reward
    1.11
    reward
    1.07
    Rewards
    1.07
     rewarded
    1.02
    rewards
    0.99
    Act Density 0.080%

    No Known Activations