INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    E
    1.09
     on
    1.01
    S
    0.97
    P
    0.93
    0.90
    L
    0.89
    YX
    0.87
    ED
    0.86
    EW
    0.86
    EH
    0.86
    POSITIVE LOGITS
    f
    1.18
    at
    1.06
    is
    1.04
    و
    1.03
    og
    1.01
    ו
    1.00
    ur
    0.98
    us
    0.98
    atation
    0.97
     characterized
    0.93
    Act Density 0.051%

    No Known Activations