INDEX
    Explanations

    references to personal pronouns and possessive adjectives

    New Auto-Interp
    Negative Logits
    ington
    -0.16
     possibility
    -0.15
    ered
    -0.15
    ness
    -0.14
    ering
    -0.14
    iosa
    -0.14
    ord
    -0.14
    itest
    -0.14
     lifestyles
    -0.13
     (
    -0.13
    POSITIVE LOGITS
     own
    0.43
    /her
    0.30
    SELF
    0.26
     próp
    0.25
     Own
    0.24
    Own
    0.24
    self
    0.24
    own
    0.24
    _own
    0.23
    sel
    0.22
    Act Density 0.965%

    No Known Activations