INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    \
    1.88
    '
    1.75
    :
    1.71
    -
    1.57
    ?
    1.55
    "
    1.48
    )
    1.46
    (
    1.42
    ;
    1.41
    ,
    1.37
    POSITIVE LOGITS
    s
    1.59
    first
    1.51
    to
    1.43
    as
    1.26
    in
    1.19
    named
    1.19
    not
    1.17
    default
    1.16
    text
    1.14
    underwear
    1.14
    Act Density 0.596%

    No Known Activations