INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lobs
    -0.08
     Kevin
    -0.07
     часов
    -0.07
    isk
    -0.06
    truth
    -0.06
     kissing
    -0.06
    ونية
    -0.06
     века
    -0.06
     forth
    -0.06
     spat
    -0.06
    POSITIVE LOGITS
    тех
    0.07
     clickable
    0.06
    ">$
    0.06
    0.06
    AGR
    0.06
    들에게
    0.06
    =G
    0.06
    ;}
    0.06
     국민
    0.06
    _UC
    0.06
    Act Density 0.033%

    No Known Activations