INDEX
    Explanations

    references to various nationalities or groups of people

    New Auto-Interp
    Negative Logits
    gger
    -0.18
    lessly
    -0.18
    illard
    -0.17
    ãĥijãĥ³
    -0.15
    acular
    -0.15
    olson
    -0.15
    ayer
    -0.14
    atform
    -0.14
    ácil
    -0.14
    人åijĺ
    -0.14
    POSITIVE LOGITS
    -American
    0.17
     who
    0.16
    anness
    0.15
    -only
    0.15
    ischer
    0.14
    -made
    0.14
    tons
    0.14
     addslashes
    0.14
    .gwt
    0.13
     fing
    0.13
    Act Density 0.097%

    No Known Activations