INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    1.24
    c
    1.00
    h
    0.89
    போது
    0.81
    Keep
    0.79
    tanh
    0.77
     особенности
    0.76
    သည်။
    0.75
    出会
    0.75
    の発
    0.75
    POSITIVE LOGITS
    на
    1.42
     shown
    1.26
    shown
    1.18
    1.14
    ب
    1.09
    за
    1.06
    يل
    1.05
    iskt
    1.02
    е
    1.00
    وت
    0.99
    Act Density 0.003%

    No Known Activations