INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ہ
    1.17
     i
    1.13
    ta
    1.13
    u
    1.09
    IS
    1.04
    i
    1.01
    ל
    1.01
    𝖆
    0.99
    0
    0.98
    AR
    0.97
    POSITIVE LOGITS
    ни
    1.24
     وعلى
    1.14
    ির
    1.07
    ामुळे
    1.07
    логии
    1.04
    ान
    1.02
    десят
    1.02
     trebui
    1.00
    습니다
    1.00
     кстати
    1.00
    Act Density 1.239%

    No Known Activations