INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     flirting
    -0.07
     even
    -0.06
    _CONTACT
    -0.06
     burada
    -0.06
     OSI
    -0.06
     travelling
    -0.06
     Existing
    -0.06
     역시
    -0.06
     axiom
    -0.06
    NavItem
    -0.06
    POSITIVE LOGITS
     ال
    0.07
     ανα
    0.06
     immigrants
    0.06
    .Microsoft
    0.06
     thumb
    0.06
     camp
    0.06
     بین
    0.06
    0.06
     dív
    0.06
    trace
    0.06
    Act Density 0.148%

    No Known Activations