INDEX
    Explanations

    exclamatory words expressing strong emotions, such as surprise or disbelief

    expressions of disbelief or sarcasm

    New Auto-Interp
    Negative Logits
    arthy
    -0.75
    icipated
    -0.71
    enary
    -0.71
    adr
    -0.69
    athered
    -0.64
    bett
    -0.63
    fold
    -0.63
    idas
    -0.62
    sylv
    -0.61
    eded
    -0.60
    POSITIVE LOGITS
     Seriously
    0.84
    zers
    0.84
     dunno
    0.79
     kidding
    0.78
    FTWARE
    0.77
    Seriously
    0.75
     Helpful
    0.75
     tho
    0.73
    ?!
    0.70
    essage
    0.69
    Act Density 0.055%

    No Known Activations