INDEX
    Explanations

    words related to interruptions or interjections in conversation, particularly starting with "Oh," and ", and"

    repeated phrases or expressions of surprise or emphasis

    New Auto-Interp
    Negative Logits
    ascript
    -0.86
    ļéĨĴ
    -0.70
    resso
    -0.66
    İĭ
    -0.66
    ome
    -0.65
    inction
    -0.64
    emo
    -0.64
    UGC
    -0.63
    robe
    -0.63
    ForgeModLoader
    -0.63
    POSITIVE LOGITS
     yeah
    1.05
     yes
    0.93
     sir
    0.85
     thank
    0.82
     sorry
    0.82
     uh
    0.81
     dear
    0.79
     Wait
    0.77
     hello
    0.75
     please
    0.73
    Act Density 0.103%

    No Known Activations