INDEX
    Explanations

    phrases involving the word "of"

    New Auto-Interp
    Negative Logits
     Mehran
    -0.77
    Score
    -0.76
    alde
    -0.72
    Zone
    -0.67
    edin
    -0.66
    ocket
    -0.65
    reddits
    -0.64
    oor
    -0.64
     adjusts
    -0.64
     Ange
    -0.64
    POSITIVE LOGITS
     hypocrisy
    1.10
     conspiring
    1.09
     violating
    1.07
     being
    1.06
     misrepresent
    1.01
     neglect
    0.95
     committing
    0.94
     having
    0.93
     wrongdoing
    0.93
     misconduct
    0.92
    Act Density 0.029%

    No Known Activations