INDEX
    Explanations

    terms related to rewards and bonuses

    New Auto-Interp
    Negative Logits
    ned
    -0.71
     Kone
    -0.67
     Heide
    -0.66
     jed
    -0.65
    auto
    -0.65
    mah
    -0.59
     Kines
    -0.59
     Sloan
    -0.59
     auto
    -0.59
     Fol
    -0.58
    POSITIVE LOGITS
     reward
    1.07
     Reward
    1.06
    AndEndTag
    0.99
    Reward
    0.98
    reward
    0.95
     incentive
    0.94
     mixtures
    0.92
     mixture
    0.90
     rewards
    0.90
     incentives
    0.89
    Act Density 0.116%

    No Known Activations