INDEX
    Explanations

    references to power dynamics and gender perceptions in social contexts

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.12
    2:0.03
    3:0.03
    4:0.05
    5:0.12
    6:0.02
    7:0.03
    8:0.10
    9:0.32
    10:0.05
    11:0.04
    Negative Logits
     continue
    -2.52
     undertake
    -2.47
     acquire
    -2.46
     occupy
    -2.46
    ustomed
    -2.46
     fare
    -2.45
     venture
    -2.39
     liberate
    -2.34
     carve
    -2.34
     earn
    -2.33
    POSITIVE LOGITS
    ifies
    3.21
     communicates
    3.09
    inates
    3.00
    its
    2.99
    pires
    2.99
    ㅋㅋ
    2.95
    doesn
    2.87
    acters
    2.81
     applies
    2.79
    """
    2.77
    Act Density 0.043%

    No Known Activations