INDEX
    Explanations

    references to nationalities or ethnicities

    New Auto-Interp
    Negative Logits
     Tot
    -0.15
    aint
    -0.15
    IPC
    -0.15
    داشت
    -0.15
    моÑĤ
    -0.15
    abin
    -0.14
    deer
    -0.14
    å©Ĩ
    -0.14
    átor
    -0.14
    opsis
    -0.14
    POSITIVE LOGITS
    abra
    0.16
     carrier
    0.15
     Roch
    0.15
    ROWS
    0.15
    jez
    0.15
    ADDE
    0.14
    Ïģαν
    0.14
    à¥ģà¤Ń
    0.14
    -local
    0.14
    -span
    0.13
    Act Density 0.024%

    No Known Activations