INDEX
    Explanations

    phrases indicating surprise or disbelief

    the phrase "don't even" and variations emphasizing denial or lack of awareness

    New Auto-Interp
    Negative Logits
    rend
    -0.81
    ugal
    -0.75
    cipl
    -0.70
    ubi
    -0.67
    runtime
    -0.66
    ãĤº
    -0.66
    =-=-=-=-=-=-=-=-
    -0.63
    acha
    -0.63
    edient
    -0.63
    only
    -0.62
    POSITIVE LOGITS
     remotely
    1.33
     bothering
    1.03
     bothered
    0.99
     bother
    0.98
     close
    0.84
     mentioning
    0.83
     scratch
    0.81
     mention
    0.81
     halfway
    0.80
     pretend
    0.79
    Act Density 0.057%

    No Known Activations