INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    1
    0.82
     as
    0.81
    with
    0.75
     with
    0.74
     for
    0.71
    ar
    0.69
    as
    0.66
    is
    0.65
    ado
    0.65
     on
    0.63
    POSITIVE LOGITS
     saddened
    0.63
     در
    0.61
    )$.
    0.56
     في
    0.55
     wretched
    0.53
    ו
    0.53
    0.53
    0.53
     ד
    0.52
    0.52
    Act Density 9.143%

    No Known Activations