INDEX
    Explanations

    references to feelings of shame and associated concepts

    New Auto-Interp
    Negative Logits
    esser
    -0.16
    ledo
    -0.15
    zos
    -0.14
    iets
    -0.14
     Fritz
    -0.14
    .
    -0.14
    erno
    -0.14
    vertime
    -0.13
    icut
    -0.13
    ockets
    -0.13
    POSITIVE LOGITS
    lessly
    0.21
    fully
    0.21
    ishly
    0.16
    ously
    0.16
    addock
    0.15
    ulen
    0.15
    .cx
    0.15
    LOCKS
    0.15
    broken
    0.15
    Ñģлов
    0.15
    Act Density 0.012%

    No Known Activations