INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ل
    1.45
    الن
    1.27
    ز
    1.17
    1.17
    ל
    1.16
    ו
    1.15
    ر
    1.07
    غ
    1.07
    rp
    1.06
    л
    1.05
    POSITIVE LOGITS
    ные
    1.07
     of
    1.04
    1.01
    1.00
    0.96
     an
    0.95
     the
    0.95
     viral
    0.93
    the
    0.91
    of
    0.89
    Act Density 0.000%

    No Known Activations