INDEX
    Explanations

    references to self-identity or self-concept

    New Auto-Interp
    Negative Logits
     Jensen
    -0.16
    ContentLoaded
    -0.15
     ComVisible
    -0.15
     jadx
    -0.14
    etin
    -0.14
    spb
    -0.14
    -transitional
    -0.14
    çĽijåIJ¬é¡µéĿ¢
    -0.14
     antid
    -0.13
    ÑģÑĮ
    -0.13
    POSITIVE LOGITS
    lessness
    0.23
    hood
    0.23
     preservation
    0.23
     Preservation
    0.22
     reliance
    0.21
    ish
    0.20
    ISH
    0.20
    uff
    0.19
    LESS
    0.18
     defense
    0.18
    Act Density 0.013%

    No Known Activations