INDEX
    Explanations

    words indicating nationalities or ethnic identities

    New Auto-Interp
    Negative Logits
    ord
    -0.15
     the
    -0.14
    aine
    -0.14
     tam
    -0.14
    ourcem
    -0.14
    less
    -0.13
    lessness
    -0.13
    in
    -0.13
     _
    -0.13
    ifs
    -0.13
    POSITIVE LOGITS
    -American
    0.17
    -Russian
    0.16
    ization
    0.16
    kest
    0.15
    -flag
    0.15
    ize
    0.14
    iqueta
    0.14
    issan
    0.14
    throp
    0.14
    izes
    0.14
    Act Density 0.145%

    No Known Activations