INDEX
    Explanations

    phrases related to being rewarded or greeted with something

    phrases indicating a response or result related to receiving something

    New Auto-Interp
    Negative Logits
    wake
    -0.76
    sche
    -0.68
    urat
    -0.68
    wash
    -0.66
    soon
    -0.64
     deceived
    -0.62
    alias
    -0.61
     susceptible
    -0.59
    affected
    -0.58
     decom
    -0.58
    POSITIVE LOGITS
    Interstitial
    0.78
    arde
    0.74
    Ĥª
    0.72
    ļéĨĴ
    0.69
     applause
    0.69
    eering
    0.69
    arrass
    0.69
     cheers
    0.69
    ¶æ
    0.69
    æµ
    0.68
    Act Density 0.312%

    No Known Activations