INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     admire
    -0.07
     visita
    -0.07
    memcmp
    -0.07
     "/";↵
    -0.07
    الي
    -0.07
     stared
    -0.06
     방문
    -0.06
    天堂
    -0.06
     diş
    -0.06
     погод
    -0.06
    POSITIVE LOGITS
     minorities
    0.07
     نسخ
    0.07
    character
    0.06
     rot
    0.06
    .percent
    0.06
    Exp
    0.06
     option
    0.06
    minutes
    0.06
     أق
    0.06
    0.06
    Act Density 0.002%

    No Known Activations