INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ى
    1.38
    ת
    1.34
    ם
    1.34
    ע
    1.31
    ן
    1.30
    ن
    1.18
    1.17
    н
    1.15
    ב
    1.15
    ம்
    1.13
    POSITIVE LOGITS
    (
    0.99
    :
    0.91
    <0xA4>
    0.86
    да
    0.86
    0.82
    thest
    0.81
    do
    0.81
    <0x91>
    0.78
    ail
    0.73
    <0xA3>
    0.72
    Act Density 0.000%

    No Known Activations