INDEX
    Explanations

    self | identity | consciousness

    New Auto-Interp
    Negative Logits
    P
    0.70
    N
    0.64
    W
    0.60
    H
    0.59
    L
    0.58
    ذلك
    0.58
    T
    0.55
    G
    0.53
    ع
    0.53
    í
    0.52
    POSITIVE LOGITS
     Identity
    0.80
     identity
    0.75
     IDENTITY
    0.70
    ۰
    0.66
    0
    0.62
    identity
    0.59
    Identity
    0.59
     Culture
    0.57
    0.54
    identité
    0.53
    Act Density 0.632%

    No Known Activations