INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     is
    1.04
     at
    0.78
    2
    0.75
     (
    0.71
    5
    0.69
     of
    0.69
     was
    0.68
    nement
    0.68
    4
    0.67
     on
    0.66
    POSITIVE LOGITS
    i
    1.27
    ת
    1.15
    in
    1.07
    us
    1.07
    as
    1.05
    1.04
    k
    1.02
    ي
    0.98
    is
    0.98
    ت
    0.94
    Act Density 0.001%

    No Known Activations