INDEX
    Explanations

    words related to the concept of "self" or identity

    New Auto-Interp
    Negative Logits
    pars
    -0.18
    bane
    -0.17
    p
    -0.17
    ÏĢοÏĤ
    -0.17
    ested
    -0.17
    esch
    -0.17
    es
    -0.16
    esan
    -0.16
    esiz
    -0.16
    lected
    -0.15
    POSITIVE LOGITS
    OUNT
    0.23
    plitude
    0.21
    nesty
    0.21
    bling
    0.20
    بÙĪÙĦ
    0.19
    plit
    0.19
    pton
    0.19
    eric
    0.19
    ERICAN
    0.19
    bole
    0.19
    Act Density 0.069%

    No Known Activations