INDEX
    Explanations

    phrases related to discrimination and equality

    New Auto-Interp
    Negative Logits
    ÑĤин
    -0.08
    amil
    -0.07
    žit
    -0.07
    ingroup
    -0.07
    ojÃŃ
    -0.07
    eyi
    -0.07
    женÑĮ
    -0.07
    PÅĻi
    -0.07
     모ëijIJ
    -0.07
     Rosenstein
    -0.07
    POSITIVE LOGITS
     or
    0.07
    ado
    0.06
     whether
    0.06
    æĺ¯åIJ¦
    0.06
     alone
    0.06
     lack
    0.06
     conscience
    0.06
     merely
    0.06
     perceived
    0.06
     observ
    0.05
    Act Density 0.009%

    No Known Activations