INDEX
    Explanations

    words and phrases related to being rewarded for certain actions or behaviors

    terms related to rewarding and punishing actions

    New Auto-Interp
    Negative Logits
    frame
    -0.75
    space
    -0.72
    sie
    -0.72
    frames
    -0.68
    cell
    -0.66
    CON
    -0.66
    orig
    -0.65
    alter
    -0.65
    aug
    -0.65
    issues
    -0.65
    POSITIVE LOGITS
     rewarded
    1.54
     rewarding
    1.15
     rewards
    1.11
     reward
    1.08
     tremend
    0.98
    nesday
    0.98
     incentiv
    0.89
     veter
    0.87
     reap
    0.87
     showc
    0.87
    Act Density 0.015%

    No Known Activations