INDEX
    Explanations

    phrases that indicate discrimination or biased judgments based on various criteria such as gender or race

    New Auto-Interp
    Negative Logits
    uckland
    -0.85
    hiba
    -0.74
    jet
    -0.70
    along
    -0.69
    soon
    -0.68
    Edit
    -0.67
    Jet
    -0.64
    jam
    -0.64
    bats
    -0.64
    nin
    -0.63
    POSITIVE LOGITS
     nationality
    1.04
     ethnicity
    0.99
     gender
    0.90
     sheer
    0.90
     resemblance
    0.88
     conscience
    0.86
     merit
    0.86
     whim
    0.85
     principles
    0.85
     disability
    0.82
    Act Density 0.075%

    No Known Activations