INDEX
    Explanations

    references to specific cultural or social identifiers

    New Auto-Interp
    Negative Logits
    iya
    -0.18
    ija
    -0.17
    patial
    -0.17
    iena
    -0.16
    rah
    -0.16
    å¥ī
    -0.16
    usters
    -0.15
    ÙĬÙĬÙĨ
    -0.15
     Mant
    -0.15
    iy
    -0.15
    POSITIVE LOGITS
    emy
    0.31
    ÄĻ
    0.30
    Äħ
    0.29
    eli
    0.26
    enn
    0.26
    elib
    0.26
    emie
    0.24
    ÄĻż
    0.24
    enny
    0.23
    ÅĦ
    0.23
    Act Density 0.015%

    No Known Activations