INDEX
    Explanations

    expressions of satisfaction or gratitude

    expressions of happiness or contentment

    New Auto-Interp
    Negative Logits
    Format
    -0.69
    Downloadha
    -0.68
    perse
    -0.66
    ciplinary
    -0.65
    ults
    -0.65
    catentry
    -0.64
    cum
    -0.64
    ufact
    -0.64
    uria
    -0.64
    cano
    -0.62
    POSITIVE LOGITS
     they
    0.74
     Tid
    0.71
     we
    0.70
     you
    0.68
     tid
    0.68
     THEY
    0.67
    fully
    0.65
     somebody
    0.64
     nobody
    0.63
    terday
    0.63
    Act Density 0.067%

    No Known Activations