INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    el
    1.04
    ک
    0.93
     as
    0.92
    ă
    0.87
    as
    0.80
    0.79
     to
    0.78
    ة
    0.78
    ال
    0.77
     at
    0.77
    POSITIVE LOGITS
    (
    0.93
    t
    0.86
    تهم
    0.79
    (.*
    0.79
    ように
    0.77
     multiport
    0.75
    (,
    0.74
    0.74
    تان
    0.73
    (",
    0.72
    Act Density 0.023%

    No Known Activations