INDEX
    Explanations

    derogatory language or remarks

    terms related to racial slurs and derogatory language

    New Auto-Interp
    Negative Logits
    compan
    -0.70
    session
    -0.68
    angel
    -0.66
    NetMessage
    -0.62
    ocr
    -0.61
    growth
    -0.61
     reconc
    -0.61
    packing
    -0.61
     Folder
    -0.61
    Whe
    -0.60
    POSITIVE LOGITS
     slurs
    1.43
     slur
    1.24
    pees
    0.80
    plings
    0.80
    rimination
    0.78
     dispar
    0.76
    iple
    0.74
    ï¸ı
    0.73
     guiActiveUn
    0.72
     insults
    0.71
    Act Density 0.013%

    No Known Activations