INDEX
    Explanations

    concepts related to feelings of shame and expressions of discontent

    New Auto-Interp
    Negative Logits
    ãģĬãĤĬ
    -0.19
    ehir
    -0.17
    çħ§
    -0.17
    _shader
    -0.16
    sic
    -0.16
    sen
    -0.16
    lod
    -0.15
    neh
    -0.15
     extremes
    -0.15
    ation
    -0.15
    POSITIVE LOGITS
    pherd
    0.19
    peare
    0.19
    cro
    0.17
    akespeare
    0.16
    ampoo
    0.16
    ppard
    0.16
    ered
    0.15
    tember
    0.15
    /bl
    0.15
    orthand
    0.15
    Act Density 0.183%

    No Known Activations