INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ق
    1.64
    t
    1.44
    il
    1.41
    ك
    1.32
    c
    1.31
    w
    1.30
    q
    1.23
    ва
    1.22
    ع
    1.20
    غ
    1.20
    POSITIVE LOGITS
    1.05
    IR
    0.99
    SS
    0.97
    OC
    0.96
     Setiap
    0.95
     that
    0.93
     Banyak
    0.92
    AT
    0.91
    CS
    0.87
    RS
    0.87
    Act Density 0.001%

    No Known Activations