INDEX
    Explanations

    words related to discrimination, bias, and prejudice

    terms associated with extreme or derogatory labeling and bias

    New Auto-Interp
    Negative Logits
    aird
    -0.78
    stellar
    -0.73
    stable
    -0.72
    frames
    -0.70
     Indigo
    -0.69
    quart
    -0.68
    Sync
    -0.67
    erald
    -0.67
     Chrys
    -0.66
    oglobin
    -0.65
    POSITIVE LOGITS
     tactics
    1.06
     intimidation
    1.01
     blackmail
    1.00
     extortion
    0.90
     spying
    0.89
     perpetrated
    0.89
     retaliation
    0.88
     accusations
    0.88
     abuses
    0.87
     threats
    0.87
    Act Density 0.283%

    No Known Activations