INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I
    1.06
     It
    1.00
     (
    0.96
    м
    0.95
     The
    0.94
    بی
    0.93
    !
    0.93
    0.93
     A
    0.92
    к
    0.91
    POSITIVE LOGITS
    ul
    1.08
    is
    1.04
    t
    1.04
    a
    0.99
    to
    0.96
    on
    0.93
    d
    0.89
    i
    0.86
    ou
    0.84
    iz
    0.84
    Act Density 0.000%

    No Known Activations