INDEX
    Explanations

    expressions of frustration and commentary related to specific tasks or systems

    ending in "self" or "selves"

    myself, oneself, himself

    New Auto-Interp
    Negative Logits
    ”.
    -0.64
    )”.
    -0.61
    ).[
    -0.59
    “.
    -0.58
    =".
    -0.57
    ,“
    -0.56
    ”).
    -0.56
     ”.
    -0.56
    !】
    -0.56
    ofluor
    -0.55
    POSITIVE LOGITS
     myſelf
    1.01
     itſelf
    0.88
     tbh
    0.86
     myself
    0.84
     himſelf
    0.82
     themſelves
    0.78
    ſelf
    0.77
     pleaſure
    0.76
     haha
    0.75
     hehe
    0.71
    Act Density 0.631%

    No Known Activations