INDEX
    Explanations

    pronouns referring to ourselves

    references to self-identity and self-perception

    New Auto-Interp
    Negative Logits
    orie
    -0.75
    onna
    -0.70
     Sierra
    -0.65
    lets
    -0.65
    cemic
    -0.64
    ros
    -0.63
    heny
    -0.63
    emis
    -0.63
     Klu
    -0.62
    mer
    -0.62
    POSITIVE LOGITS
    selves
    1.61
     ourselves
    1.25
     tremend
    1.01
    self
    0.98
     selves
    0.96
     exting
    0.95
     proport
    0.84
     eleph
    0.82
     perspect
    0.81
     exha
    0.81
    Act Density 0.006%

    No Known Activations