INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ти
    1.03
    خ
    0.95
    ش
    0.91
    其他
    0.89
    ل
    0.89
    0.87
    ب
    0.86
    أ
    0.84
    د
    0.83
    (
    0.82
    POSITIVE LOGITS
     are
    1.40
     is
    1.31
    d
    1.14
     
    0.99
    j
    0.98
    t
    0.93
    dır
    0.87
     has
    0.86
    on
    0.85
    y
    0.85
    Act Density 0.662%

    No Known Activations