INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    I
    0.80
    EM
    0.73
    ۰
    0.72
    OOOO
    0.70
    N
    0.66
    Emp
    0.65
    LY
    0.65
    HEAD
    0.64
    HOL
    0.64
    A
    0.63
    POSITIVE LOGITS
    ن
    1.15
    ط
    1.08
    د
    0.95
    تالي
    0.94
    0.91
    رك
    0.90
    رت
    0.88
    ر
    0.86
    ني
    0.85
     diferite
    0.84
    Act Density 0.001%

    No Known Activations