INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ない
    1.00
    ی
    0.95
    ంటి
    0.89
    ीकरण
    0.86
    دە
    0.84
    0.83
    EM
    0.81
     robbing
    0.81
    0.80
    te
    0.79
    POSITIVE LOGITS
    or
    1.30
     purposes
    1.10
    ır
    0.99
    і
    0.91
    R
    0.89
    ac
    0.87
    х
    0.86
    giveness
    0.86
    یا
    0.85
    ѕ
    0.84
    Act Density 0.179%

    No Known Activations