INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    0.48
    ла
    0.46
    u
    0.43
    0.40
    ون
    0.39
    in
    0.39
    p
    0.37
    im
    0.36
    و
    0.36
    us
    0.35
    POSITIVE LOGITS
     a
    0.62
     
    0.57
     is
    0.54
     at
    0.48
     an
    0.43
     of
    0.38
     to
    0.38
     the
    0.35
     {
    0.34
     la
    0.34
    Act Density 18.890%

    No Known Activations