INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    is
    1.69
    un
    1.59
    es
    1.57
    1.44
    ar
    1.36
    -
    1.36
    o
    1.35
    en
    1.34
    of
    1.34
    an
    1.30
    POSITIVE LOGITS
     to
    1.25
     at
    1.19
     whatnot
    1.02
    0
    1.02
    8
    1.02
    9
    0.99
    AY
    0.90
    ۰۰
    0.89
    ам
    0.87
    \}$,
    0.86
    Act Density 0.001%

    No Known Activations