INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     که
    1.10
     that
    1.06
    dır
    1.02
    1.02
    1.02
    0.89
    اً
    0.88
    dür
    0.87
    1
    0.85
    いた
    0.85
    POSITIVE LOGITS
    il
    1.36
    on
    1.26
    is
    1.23
    i
    1.12
    ir
    1.06
     I
    1.05
    ле
    1.05
    p
    1.05
    em
    1.04
    /
    1.02
    Act Density 0.001%

    No Known Activations