INDEX
    Explanations

    concepts related to power dynamics and authority

    New Auto-Interp
    Negative Logits
    arend
    -0.17
    afari
    -0.16
    eways
    -0.15
    eward
    -0.15
    urnal
    -0.15
    eker
    -0.15
    zelf
    -0.14
    -worthy
    -0.14
    bable
    -0.14
    owers
    -0.14
    POSITIVE LOGITS
    fully
    0.27
    full
    0.21
    ful
    0.20
    /power
    0.20
    735
    0.19
    lessness
    0.18
    633
    0.17
    fu
    0.16
    lier
    0.16
    aged
    0.15
    Act Density 0.064%

    No Known Activations