INDEX
    Explanations

    statements involving strong claims, warnings, and descriptions from various individuals

    New Auto-Interp
    Negative Logits
    impact
    -0.67
    ija
    -0.65
    mite
    -0.63
    gradient
    -0.62
    aspx
    -0.61
    emaker
    -0.60
    mt
    -0.59
    ãĥİ
    -0.58
    veyard
    -0.58
    wayne
    -0.57
    POSITIVE LOGITS
     sarcast
    0.82
     bluntly
    0.76
     passionately
    0.71
     apologised
    0.71
    :"
    0.68
     remarks
    0.68
     apolog
    0.67
     apologise
    0.66
     listeners
    0.66
     forcefully
    0.66
    Act Density 0.263%

    No Known Activations