INDEX
    Explanations

    tokens after punctuation

    New Auto-Interp
    Negative Logits
     …,
    0.99
     (,
    0.95
    ?,
    0.88
     (),
    0.85
    …,
    0.83
     ().
    0.82
     !,
    0.81
     ?,
    0.80
    ,)
    0.80
    ,),
    0.79
    POSITIVE LOGITS
    <unused2010>
    1.04
    <unused1216>
    1.03
    <unused1148>
    0.97
    <unused639>
    0.96
    <unused2166>
    0.95
    <unused947>
    0.94
    <unused1071>
    0.92
    <unused566>
    0.92
    <unused1676>
    0.92
    <unused271>
    0.92
    Act Density 0.003%

    No Known Activations