INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ح
    1.19
     It
    1.06
     are
    1.04
    ile
    1.02
    ene
    1.00
    س
    0.99
    are
    0.97
    ige
    0.93
     (
    0.91
    ype
    0.91
    POSITIVE LOGITS
    ٹ
    1.62
    те
    1.54
    r
    1.47
    يد
    1.43
    1.41
    щий
    1.40
    p
    1.40
    ية
    1.37
    ви
    1.35
    ר
    1.34
    Act Density 0.025%

    No Known Activations