INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    batim
    -0.08
    riors
    -0.08
    ทร
    -0.07
     бө
    -0.07
     ерек
    -0.07
    ERRIDE
    -0.07
    шае
    -0.07
     feel
    -0.07
     сақтау
    -0.07
     محفوظ
    -0.07
    POSITIVE LOGITS
     liên
    0.08
     neighbour
    0.07
     participant
    0.07
     lips
    0.07
     dui
    0.07
     fk
    0.07
     based
    0.07
    цен
    0.07
    0.07
     dk
    0.07
    Act Density 0.001%

    No Known Activations