INDEX
    Explanations

    instances of the word "sh" followed by varying contexts, indicating a focus on shock or surprise expressions

    New Auto-Interp
    Negative Logits
    strup
    -0.09
    stru
    -0.08
    hend
    -0.08
    arend
    -0.07
    mpar
    -0.07
    imbus
    -0.07
    .resp
    -0.07
    wend
    -0.07
    iaux
    -0.07
    ysa
    -0.07
    POSITIVE LOGITS
    es
    0.07
     warm
    0.07
     sh
    0.06
     Bones
    0.06
    sh
    0.06
    allow
    0.06
    ales
    0.06
     Force
    0.06
     hybrid
    0.06
     coun
    0.05
    Act Density 0.009%

    No Known Activations