INDEX
    Explanations

    terms related to gender identity and gender norms

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.07
    3:0.21
    4:0.03
    5:0.02
    6:0.17
    7:0.19
    8:0.04
    9:0.04
    10:0.07
    11:0.07
    Negative Logits
     Pradesh
    -1.34
    DERR
    -1.21
    ו
    -1.17
    ventory
    -1.10
     Airways
    -1.10
    Lord
    -1.10
    Priv
    -1.09
    Wil
    -1.09
    schild
    -1.09
    ドラゴン
    -1.08
    POSITIVE LOGITS
     shaming
    1.19
     deport
    1.15
     seniors
    1.11
    inki
    1.05
     Dems
    1.05
     remix
    1.04
     ornament
    1.03
     dystop
    1.02
    alogy
    1.02
     activism
    1.01
    Act Density 0.012%

    No Known Activations