INDEX
    Explanations

    references to group identities or categories, particularly in discussions about people or organizations

    New Auto-Interp
    Negative Logits
    ÏĦÏĮ
    -0.16
    ÙĬب
    -0.14
    .cast
    -0.14
    اعد
    -0.14
    lop
    -0.14
    жа
    -0.13
    åĨ
    -0.13
    lick
    -0.13
    ayet
    -0.13
    abelle
    -0.13
    POSITIVE LOGITS
     besides
    0.18
    neck
    0.18
     than
    0.17
    318
    0.17
    Besides
    0.16
    ello
    0.15
     Besides
    0.15
    /ws
    0.15
    _INCREF
    0.15
     niż
    0.15
    Act Density 0.315%

    No Known Activations