INDEX
    Explanations

    expressions related to personal identity and self-reflection

    New Auto-Interp
    Negative Logits
    loat
    -0.15
    icky
    -0.14
    leans
    -0.14
    ovice
    -0.14
     recommendation
    -0.14
    evil
    -0.13
    apos
    -0.13
    ãģ¨ãģĨ
    -0.13
    ırak
    -0.13
    odyn
    -0.13
    POSITIVE LOGITS
     being
    0.23
     reality
    0.21
     actions
    0.21
     existence
    0.20
     Being
    0.20
     own
    0.20
     humanity
    0.19
     environment
    0.19
     worth
    0.18
     past
    0.18
    Act Density 0.234%

    No Known Activations