INDEX
    Explanations

    personal pronouns and references to individual identity or possession

    New Auto-Interp
    Negative Logits
    lj
    -0.15
    oko
    -0.15
     hip
    -0.14
    hip
    -0.14
     Hip
    -0.14
    piler
    -0.14
    ëıĦ
    -0.14
    arak
    -0.14
     diplom
    -0.14
     Ney
    -0.14
    POSITIVE LOGITS
    iosa
    0.17
    anmar
    0.16
    irut
    0.15
     defenses
    0.15
    chet
    0.14
    untas
    0.14
    ürn
    0.14
    ãĤ¨ãĥ«
    0.14
    ocard
    0.14
    etime
    0.14
    Act Density 0.411%

    No Known Activations