INDEX
    Explanations

    terms and concepts related to self-awareness and self-identity

    New Auto-Interp
    Negative Logits
    vala
    -0.17
    lse
    -0.16
     ourselves
    -0.15
    èĩªå·±
    -0.15
    lr
    -0.14
    zk
    -0.14
    ãĥ©ãĤ¹
    -0.14
    ous
    -0.14
    üb
    -0.14
     themselves
    -0.14
    POSITIVE LOGITS
    hood
    0.28
    änd
    0.22
    /self
    0.20
    ishly
    0.20
    same
    0.19
    Portrait
    0.18
     hood
    0.18
    hoo
    0.18
    ständ
    0.18
    ridge
    0.18
    Act Density 0.035%

    No Known Activations