INDEX
    Explanations

    sequences of repeated letters

    expressions of excitement or surprise

    New Auto-Interp
    Negative Logits
    ourse
    -0.76
    "]=>
    -0.64
     contribut
    -0.62
    ãĥĹ
    -0.61
     substitution
    -0.60
    IAL
    -0.59
     vacated
    -0.57
     Redux
    -0.57
    occup
    -0.57
     Hasan
    -0.57
    POSITIVE LOGITS
    mmmm
    1.04
    mmm
    0.91
    oooo
    0.91
     kidding
    0.87
    ooo
    0.87
    ahah
    0.84
    hhhh
    0.82
    aaaa
    0.79
    hhh
    0.79
    !!!!!
    0.78
    Act Density 0.355%

    No Known Activations