INDEX
    Explanations

    references to the concept of "self" or self-control

    references to the concept of "self" or self-related terms

    New Auto-Interp
    Negative Logits
    cill
    -0.73
    aldi
    -0.72
    cot
    -0.69
    lu
    -0.68
    cli
    -0.67
    cape
    -0.64
    lam
    -0.64
     mole
    -0.63
    gypt
    -0.62
    opers
    -0.61
    POSITIVE LOGITS
     Self
    3.35
     self
    1.63
    self
    1.56
    Self
    1.53
     selves
    1.20
     Mutual
    1.00
    selves
    0.98
     Personality
    0.98
     Narc
    0.96
     Subtle
    0.93
    Act Density 0.005%

    No Known Activations