INDEX
    Explanations

    phrases related to disbelief or astonishment

    expressions of disbelief or questioning reality

    New Auto-Interp
    Negative Logits
    kefeller
    -0.69
    Lic
    -0.64
    athered
    -0.63
    ourses
    -0.61
    ithe
    -0.61
    odox
    -0.59
    marg
    -0.59
    haps
    -0.59
    agonists
    -0.59
    umbnails
    -0.58
    POSITIVE LOGITS
    !"
    1.10
    !!!!!
    1.09
     haha
    1.05
     :)
    1.05
    !!"
    1.05
     ;)
    1.01
    !".
    0.98
    !'
    0.97
    !!!!
    0.97
    ?!"
    0.96
    Act Density 0.609%

    No Known Activations