INDEX
    Explanations

    list items and contexts

    New Auto-Interp
    Negative Logits
     is
    2.17
     to
    2.14
     a
    1.65
     on
    1.63
     
    1.57
     (
    1.32
     of
    1.20
     على
    1.02
    kannya
    0.98
    ä
    0.96
    POSITIVE LOGITS
    for
    1.79
    و
    1.73
    in
    1.70
    and
    1.66
    1.52
    ق
    1.48
    For
    1.48
    us
    1.47
    1.47
    1.44
    Act Density 1.072%

    No Known Activations