INDEX
    Explanations

    references to specific countries or nationalities

    New Auto-Interp
    Negative Logits
    velt
    -0.15
    bersome
    -0.15
    Ïĥμα
    -0.15
    lech
    -0.15
    ivalent
    -0.14
    701
    -0.14
    frei
    -0.13
    wner
    -0.13
    poons
    -0.13
    aÄŁa
    -0.13
    POSITIVE LOGITS
    ian
    0.20
    can
    0.18
    ifornia
    0.18
    ican
    0.17
    -Russian
    0.17
    ÑģÑĤан
    0.17
    bian
    0.17
    -American
    0.16
    ish
    0.16
    589
    0.16
    Act Density 0.182%

    No Known Activations