INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    i
    1.16
    ir
    1.07
    ي
    1.05
    IS
    0.96
    نا
    0.92
    ش
    0.88
    k
    0.82
    m
    0.81
    ى
    0.80
    ر
    0.78
    POSITIVE LOGITS
    с
    0.75
    кий
    0.68
    sembled
    0.67
    0.66
    ём
    0.65
    1
    0.63
    ди
    0.62
    0.59
     եր
    0.59
     ತಮ್ಮ
    0.58
    Act Density 4.712%

    No Known Activations