INDEX
    Explanations

    phrases identifying specific individuals

    references to individuals specifically using the word "himself," "herself," or "themselves."

    New Auto-Interp
    Negative Logits
    onal
    -0.83
    olid
    -0.71
    ysical
    -0.70
    CLOSE
    -0.69
    SHIP
    -0.66
     Frenzy
    -0.63
     convergence
    -0.63
    ammy
    -0.62
     RELEASE
    -0.61
    odies
    -0.60
    POSITIVE LOGITS
     admitted
    0.94
     confessed
    0.88
     admits
    0.86
     acknowledged
    0.84
    é¾įåĸļ士
    0.81
     conceded
    0.77
     penned
    0.77
     profess
    0.75
     doubted
    0.74
     contradicted
    0.74
    Act Density 0.039%

    No Known Activations