INDEX
    Explanations

    expressions of frustration or annoyance, such as phew, ugh, oh, and sighs

    New Auto-Interp
    Negative Logits
    iers
    -0.42
    icts
    -0.41
    iership
    -0.39
    jri
    -0.39
    ":[{"
    -0.39
    ifles
    -0.39
    jug
    -0.38
     sanctioned
    -0.38
    forming
    -0.38
    ensis
    -0.38
    POSITIVE LOGITS
    HHHH
    0.55
    hhhh
    0.54
    hhh
    0.52
    hh
    0.49
     goodbye
    0.44
     Clockwork
    0.43
    athe
    0.43
    awk
    0.42
    ouls
    0.41
    yeah
    0.41
    Act Density 5.800%

    No Known Activations