INDEX
    Explanations

    questions and expressions of surprise

    expressions of surprise or disbelief

    New Auto-Interp
    Negative Logits
     diffusion
    -0.72
    ezvous
    -0.71
     preference
    -0.64
     coasts
    -0.59
    idem
    -0.58
     favors
    -0.58
     Nieto
    -0.58
     mutually
    -0.58
    aturday
    -0.56
     liber
    -0.55
    POSITIVE LOGITS
    ?!
    1.09
    ?!"
    1.05
    ?)
    1.00
    Huh
    1.00
    ???
    0.99
    ?".
    0.97
    !?"
    0.96
    Why
    0.96
    ?).
    0.95
    !?
    0.94
    Act Density 0.446%

    No Known Activations