INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ה
    1.52
    이지만
    1.48
    ك
    1.45
    1.44
    يد
    1.38
    Д
    1.36
    the
    1.31
    ти
    1.31
    Пре
    1.30
    이면
    1.28
    POSITIVE LOGITS
    ong
    1.41
    .
    1.29
    ede
    1.24
    regation
    1.14
    aren
    1.12
    د
    1.09
    ested
    1.07
    ır
    1.05
    hearted
    1.05
    ete
    1.03
    Act Density 0.000%

    No Known Activations