INDEX
    Explanations

    instances of the word "shame" or related forms in various contexts

    New Auto-Interp
    Negative Logits
    im
    -0.16
    ended
    -0.16
    anche
    -0.16
    ekt
    -0.15
    sin
    -0.15
    324
    -0.15
    are
    -0.15
    ges
    -0.14
    acio
    -0.14
    Parm
    -0.14
    POSITIVE LOGITS
     sh
    0.43
    (sh
    0.17
    .sh
    0.17
    rew
    0.17
    enan
    0.17
    sh
    0.17
    -sh
    0.16
    ogan
    0.15
    rou
    0.15
    .scalablytyped
    0.15
    Act Density 0.017%

    No Known Activations