INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    يا
    1.21
    يك
    1.19
    и
    1.08
    1.04
    0.97
    ي
    0.96
     നിന്ന്
    0.93
    0.91
    0.91
    ge
    0.90
    POSITIVE LOGITS
    )
    1.49
    a
    1.36
    ing
    1.27
     an
    1.25
    ien
    1.23
     a
    1.20
    یم
    1.20
    ک
    1.18
     can
    1.17
    ain
    1.14
    Act Density 0.000%

    No Known Activations