INDEX
    Explanations

    phrases related to investigations or allegations

    repeated phrases that indicate allegations of wrongdoing

    New Auto-Interp
    Negative Logits
    uristic
    -0.90
    ieties
    -0.76
    ertodd
    -0.74
    hap
    -0.71
     folds
    -0.70
    heses
    -0.68
    nets
    -0.68
    Tokens
    -0.68
     partName
    -0.68
    keys
    -0.68
    POSITIVE LOGITS
     inacc
    0.95
     wrongdoing
    0.88
     harassment
    0.80
     misconduct
    0.75
     misinformation
    0.74
     discrimination
    0.71
     criminality
    0.71
     violence
    0.71
     vandalism
    0.70
     foul
    0.70
    Act Density 0.158%

    No Known Activations