INDEX
    Explanations

    words and phrases related to power dynamics and authority

    New Auto-Interp
    Negative Logits
    sis
    -0.20
    _power
    -0.16
    iban
    -0.16
     poder
    -0.15
    nore
    -0.15
     pouvoir
    -0.15
    POWER
    -0.15
    æ´¥
    -0.15
    arget
    -0.15
    ureka
    -0.14
    POSITIVE LOGITS
    fully
    0.42
    houses
    0.32
    full
    0.29
    ful
    0.28
    bro
    0.24
    lifting
    0.24
    lessness
    0.24
    FUL
    0.23
    broker
    0.23
    train
    0.22
    Act Density 0.072%

    No Known Activations