INDEX
    Explanations

    instances of sarcasm or irony in language

    New Auto-Interp
    Negative Logits
    ces
    -0.16
    itle
    -0.16
    apan
    -0.15
    mdat
    -0.15
    ppv
    -0.15
    itz
    -0.14
    ellan
    -0.14
    adesh
    -0.14
    unar
    -0.14
    oston
    -0.14
    POSITIVE LOGITS
    dns
    0.17
    wis
    0.16
     dns
    0.15
    .rl
    0.14
    ηÏĤ
    0.14
    trail
    0.14
    esti
    0.14
    Č↵
    0.14
     dear
    0.14
     Trails
    0.14
    Act Density 0.698%

    No Known Activations