INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    u
    1.08
    in
    1.02
    i
    0.89
    t
    0.88
    the
    0.88
    وں
    0.86
     uniforme
    0.79
    0.79
    ā
    0.77
     کننده
    0.75
    POSITIVE LOGITS
    س
    0.89
     washer
    0.88
    с
    0.83
    ре
    0.81
    б
    0.75
    <0x91>
    0.75
     Washer
    0.74
    ни
    0.71
    ри
    0.70
    ة
    0.70
    Act Density 0.001%

    No Known Activations