INDEX
    Explanations

    references to concepts of "self" or identity

    self and reflexive pronouns

    New Auto-Interp
    Negative Logits
     Infórmanos
    -0.80
     Importing
    -0.60
     importing
    -0.59
     loopholes
    -0.53
     Dian
    -0.53
     נוס
    -0.53
     crates
    -0.52
     the
    -0.52
     cozin
    -0.52
     arenas
    -0.52
    POSITIVE LOGITS
    Self
    1.39
     Self
    1.30
     SELF
    1.27
    self
    1.27
     self
    1.26
    SELF
    1.20
     selves
    1.03
     thyself
    0.87
     Yourself
    0.84
    himself
    0.84
    Act Density 0.020%

    No Known Activations