INDEX
    Explanations

    mentions of derogatory terms or prejudiced language

    references to racism and bigotry

    New Auto-Interp
    Negative Logits
     backup
    -0.82
     resil
    -0.77
     GOODMAN
    -0.74
     rotation
    -0.74
     refurb
    -0.73
    ufact
    -0.73
     refin
    -0.73
    oscopic
    -0.70
     remod
    -0.69
     renovations
    -0.68
    POSITIVE LOGITS
     bigot
    1.03
     bigotry
    0.97
     coward
    0.92
    Semitic
    0.88
     slurs
    0.86
     unworthy
    0.85
     perpetrated
    0.85
     cowardly
    0.84
     insin
    0.80
     hypocrisy
    0.79
    Act Density 0.978%

    No Known Activations