INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    2.16
    1.77
    ف
    1.70
    פ
    1.66
    1.64
    م
    1.62
    ты
    1.61
    ما
    1.60
    1.59
    т
    1.58
    POSITIVE LOGITS
    ير
    2.02
     sämt
    2.02
    𝑺
    2.00
     tatsächlich
    1.98
    ológicas
    1.97
     kraju
    1.97
     allerlei
    1.94
    ៉ុ
    1.89
     pribli
    1.88
    প্রসঙ্গত
    1.88
    Act Density 0.011%

    No Known Activations