INDEX
    Explanations

    terms related to controversial and inflammatory language, including slurs and provocative statements

    terms related to controversial social and political identities

    New Auto-Interp
    Negative Logits
    ispers
    -0.84
    rams
    -0.80
     suites
    -0.80
    izens
    -0.75
    Us
    -0.74
     Shots
    -0.73
    rils
    -0.73
     patches
    -0.73
     timelines
    -0.72
     Lans
    -0.72
    POSITIVE LOGITS
     whore
    0.82
     prostitute
    0.79
     unto
    0.79
    digy
    0.79
     breaker
    0.77
     himself
    0.74
     believer
    0.74
     pretending
    0.73
    atical
    0.73
    nik
    0.72
    Act Density 0.274%

    No Known Activations