INDEX
    Explanations

    references to power dynamics and authority

    New Auto-Interp
    Negative Logits
    sis
    -0.23
    arget
    -0.15
    gnore
    -0.15
    _LAYER
    -0.15
    á»ĩn
    -0.15
    nore
    -0.14
    sel
    -0.14
    suppress
    -0.14
    avage
    -0.14
    iban
    -0.14
    POSITIVE LOGITS
    fully
    0.44
    houses
    0.34
    full
    0.30
    lessness
    0.27
    ful
    0.26
    broker
    0.25
    bro
    0.25
    point
    0.24
    lifting
    0.24
     brokers
    0.23
    Act Density 0.069%

    No Known Activations