INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    1.73
    (
    1.18
    1.05
    ير
    1.03
    v
    1.01
    مت
    0.99
    hut
    0.96
    IE
    0.94
    tank
    0.93
    ent
    0.93
    POSITIVE LOGITS
    1.53
    1.37
    ف
    1.33
    ب
    1.26
    1.19
    1.16
     you
    1.11
    1.11
    คุณ
    1.10
    ıları
    1.09
    Act Density 0.069%

    No Known Activations