INDEX
    Explanations

    terms related to self-identity and self-worth

    New Auto-Interp
    Negative Logits
    rote
    -0.17
    uden
    -0.16
    ombies
    -0.16
     [
    -0.15
    dana
    -0.15
    ary
    -0.15
    åļ
    -0.14
    irty
    -0.14
    aph
    -0.14
    tee
    -0.14
    POSITIVE LOGITS
    оналÑĮ
    0.16
    ãĤĵãģ¨
    0.16
    ElementsByTagName
    0.15
    ANNEL
    0.15
    ãģŁãģĹ
    0.15
     pokoj
    0.14
     pitches
    0.14
    atoi
    0.14
    ultz
    0.14
    .beta
    0.14
    Act Density 0.040%

    No Known Activations