INDEX
    Explanations

    references to various nationalities or ethnic groups

    New Auto-Interp
    Negative Logits
    eson
    -0.18
    lech
    -0.17
    bidden
    -0.16
    sar
    -0.15
    aldi
    -0.15
    srv
    -0.15
    ductory
    -0.14
    less
    -0.14
    enheim
    -0.14
    lessly
    -0.14
    POSITIVE LOGITS
    -American
    0.29
    -Americans
    0.21
    -Russian
    0.21
    -flag
    0.20
    -born
    0.20
    ization
    0.18
    ischer
    0.18
    ness
    0.17
    ized
    0.17
    -made
    0.16
    Act Density 0.258%

    No Known Activations