INDEX
    Explanations

    concepts related to self-identity and personal expression

    New Auto-Interp
    Negative Logits
    ungle
    -0.19
     Gir
    -0.18
    urahan
    -0.16
    monds
    -0.15
     hands
    -0.15
     Length
    -0.14
    angan
    -0.14
     Minds
    -0.14
    yc
    -0.13
     wsp
    -0.13
    POSITIVE LOGITS
    identity
    0.31
     Identity
    0.31
     identity
    0.30
    Identity
    0.29
    _identity
    0.28
    .Identity
    0.26
     identities
    0.23
    .identity
    0.23
    身份
    0.20
    .IDENTITY
    0.20
    Act Density 0.213%

    No Known Activations