INDEX
    Explanations

    references to personal attributes or experiences

    New Auto-Interp
    Negative Logits
     personalities
    -0.22
     personality
    -0.20
     Personnel
    -0.19
     Personality
    -0.17
    个人
    -0.17
    arest
    -0.16
    person
    -0.15
    arian
    -0.15
     personnel
    -0.15
    AREST
    -0.15
    POSITIVE LOGITS
    izable
    0.26
    ised
    0.25
    ty
    0.23
    izes
    0.23
    izing
    0.23
    ise
    0.22
    ities
    0.21
    isable
    0.21
    /group
    0.21
    ization
    0.21
    Act Density 0.045%

    No Known Activations