INDEX
    Explanations

    accusatory statements or allegations

    phrases indicating accusations

    New Auto-Interp
    Negative Logits
     Mehran
    -0.77
    Score
    -0.74
    ocket
    -0.68
    alde
    -0.67
    Tokens
    -0.65
    Zone
    -0.64
    dayName
    -0.64
    edin
    -0.63
    aths
    -0.63
    oor
    -0.62
    POSITIVE LOGITS
     conspiring
    1.10
     violating
    1.06
     being
    1.04
     committing
    0.97
     having
    0.96
     hypocrisy
    0.96
     wrongdoing
    0.95
     abusing
    0.94
     misconduct
    0.93
     stealing
    0.91
    Act Density 0.053%

    No Known Activations