INDEX
    Explanations

    attributes related to power dynamics and authority figures

    New Auto-Interp
    Negative Logits
    œurs
    -0.42
    bibinfo
    -0.41
    DetailActivity
    -0.40
    -0.38
     excru
    -0.38
     Easter
    -0.36
    出版年
    -0.35
    jangkau
    -0.35
    MessageTagHelper
    -0.35
    ])):
    -0.35
    POSITIVE LOGITS
     power
    0.68
     arrogance
    0.67
     arrogant
    0.66
     swagger
    0.65
     proud
    0.63
     pompous
    0.63
     strut
    0.61
     prestige
    0.60
     confidently
    0.60
     pride
    0.58
    Act Density 0.344%

    No Known Activations