INDEX
    Explanations

    self-referential phrases signaling the speaker's thoughts or actions

    first-person perspective statements and expressions of personal thoughts or feelings

    New Auto-Interp
    Negative Logits
    adra
    -0.71
    externalActionCode
    -0.70
    fig
    -0.67
    021
    -0.64
    ×Ļ
    -0.62
    otiation
    -0.60
    WAR
    -0.59
     Nanto
    -0.59
    edition
    -0.59
     è£ıè¦ļéĨĴ
    -0.58
    POSITIVE LOGITS
     joking
    0.90
     invincible
    0.84
     kidding
    0.77
     kindred
    0.69
     innocuous
    0.65
    might
    0.64
     might
    0.62
    amn
    0.61
    ļé
    0.61
    'd
    0.61
    Act Density 0.169%

    No Known Activations