INDEX
    Explanations

    anti-discrimination laws and practices

    New Auto-Interp
    Negative Logits
    laye
    -0.92
    ITICAL
    -0.90
     terle
    -0.88
    stanz
    -0.80
    Յ
    -0.80
     karş
    -0.79
    SharedModule
    -0.77
     criticizing
    -0.76
    oblotting
    -0.76
    lxt
    -0.76
    POSITIVE LOGITS
     discrimination
    4.22
     discriminatory
    3.63
     discrimin
    3.63
     discriminate
    3.53
     discriminated
    3.36
     discriminating
    3.33
     Discrimination
    3.22
    discrimination
    3.16
    discrimin
    3.13
     Discrimin
    2.83
    Act Density 0.045%

    No Known Activations