INDEX
    Explanations

    phrases indicating humor or sarcasm

    expressions of disbelief or statements that someone is joking

    New Auto-Interp
    Negative Logits
    marked
    -0.84
    ŃĶ
    -0.83
    pora
    -0.76
    part
    -0.74
    ugal
    -0.72
    por
    -0.69
    marks
    -0.67
    WIND
    -0.67
    enfranch
    -0.66
    namese
    -0.65
    POSITIVE LOGITS
     kidding
    1.04
    renheit
    0.76
     joking
    0.71
    isSpecialOrderable
    0.69
     Niet
    0.66
     aloud
    0.66
    ayers
    0.64
    NESS
    0.64
     WARN
    0.63
    sters
    0.63
    Act Density 0.024%

    No Known Activations