INDEX
    Explanations

    phrases related to criticism and attacks towards individuals or groups

    aggressive language or terms associated with criticism and attacks

    New Auto-Interp
    Negative Logits
    pection
    -0.76
    hazard
    -0.72
    cano
    -0.72
    duct
    -0.70
    poral
    -0.70
    lycer
    -0.70
    yip
    -0.69
    earchers
    -0.69
    trap
    -0.69
    kj
    -0.69
    POSITIVE LOGITS
     critics
    1.01
     liberals
    0.93
     feminists
    0.93
     commenters
    0.91
     fellow
    0.90
     Republicans
    0.90
     politicians
    0.89
     Islam
    0.88
     environmentalists
    0.88
     gays
    0.87
    Act Density 0.220%

    No Known Activations