INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    1.29
    d
    1.04
    r
    0.95
    AND
    0.92
    0.92
    ب
    0.88
    IG
    0.83
    د
    0.83
     
    0.82
    0.81
    POSITIVE LOGITS
    ور
    1.05
    0.98
    0.93
    ет
    0.88
    وريا
    0.88
    .。
    0.86
    ция
    0.83
    0.83
    0.83
    ння
    0.82
    Act Density 0.108%

    No Known Activations