INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    f
    0.45
    that
    0.43
    0.43
    0.39
    0.39
    ת
    0.39
    d
    0.38
    ل
    0.38
    z
    0.37
    ुभ
    0.36
    POSITIVE LOGITS
     be
    0.66
     
    0.59
     \
    0.53
    \
    0.49
     e
    0.48
     on
    0.48
     of
    0.47
     to
    0.46
    <
    0.44
     {
    0.43
    Act Density 0.397%

    No Known Activations