INDEX
    Explanations

    expressions of personality traits and self-descriptions

    New Auto-Interp
    Negative Logits
    ulet
    -0.19
    anz
    -0.15
    Vien
    -0.15
    hea
    -0.15
    ALIGN
    -0.15
    uzey
    -0.14
    éd
    -0.14
    oug
    -0.13
    kaar
    -0.13
    poke
    -0.13
    POSITIVE LOGITS
     easily
    0.20
     outgoing
    0.20
     prone
    0.18
     sensitive
    0.18
     intro
    0.18
     independent
    0.17
     always
    0.17
     liable
    0.16
    boro
    0.16
     liability
    0.16
    Act Density 0.416%

    No Known Activations