INDEX
    Explanations

    phrases related to accusations

    repeated instances of accusations against entities or individuals

    New Auto-Interp
    Negative Logits
    ower
    -0.76
     partName
    -0.75
    alde
    -0.70
    aware
    -0.66
    owers
    -0.65
    reddits
    -0.65
     adjusts
    -0.64
     Mehran
    -0.64
     threshold
    -0.63
     fades
    -0.63
    POSITIVE LOGITS
     disl
    0.75
     brutality
    0.74
     murdering
    0.74
     gou
    0.72
     wrongdoing
    0.72
     treason
    0.72
     conspiring
    0.70
     committing
    0.70
     racism
    0.68
     witchcraft
    0.68
    Act Density 0.052%

    No Known Activations