INDEX
    Explanations

    conversational dialogues

    expressions of disbelief or surprise

    New Auto-Interp
    Negative Logits
     unquestion
    -0.63
     undeniably
    -0.62
    utterstock
    -0.58
    virt
    -0.58
     strikingly
    -0.57
    respective
    -0.56
     predictably
    -0.55
     unsurprisingly
    -0.53
     uniformly
    -0.53
     markedly
    -0.52
    POSITIVE LOGITS
     fuckin
    0.95
     gonna
    0.92
     haha
    0.81
     wanna
    0.79
    -"
    0.77
     kinda
    0.76
    laughs
    0.74
     gotta
    0.74
     ya
    0.74
    hin
    0.74
    Act Density 0.975%

    No Known Activations