INDEX
    Explanations

    concepts related to gender roles and identities

    New Auto-Interp
    Negative Logits
     ãģĿãģ®ä»ĸ
    -0.16
    ourd
    -0.15
    ÃŃky
    -0.14
    (other
    -0.14
    roat
    -0.14
     altri
    -0.14
    #End
    -0.14
    ylko
    -0.13
    agger
    -0.13
    ãĥŃãĥ³
    -0.13
    POSITIVE LOGITS
     numerator
    0.31
     left
    0.25
     either
    0.23
     east
    0.22
     Left
    0.22
    either
    0.21
     male
    0.21
     north
    0.20
    Either
    0.20
     offense
    0.20
    Act Density 0.686%

    No Known Activations