INDEX
    Explanations

    foreign languages and characters

    New Auto-Interp
    Negative Logits
     üzeri
    0.38
    0.38
    vrir
    0.37
    Miche
    0.37
    自身的
    0.37
     consistently
    0.37
    ID
    0.35
     requiring
    0.35
    onial
    0.35
    وريا
    0.35
    POSITIVE LOGITS
    ല്ലാ
    0.46
     भग्न
    0.42
    ить
    0.39
     Дж
    0.39
    наче
    0.39
    0.39
    다가
    0.38
     trzec
    0.38
     சிகிச்ச
    0.37
     फांसी
    0.37
    Act Density 0.001%

    No Known Activations