INDEX
    Explanations

    references to women and their roles or experiences in various contexts

    New Auto-Interp
    Negative Logits
    /she
    -0.20
     females
    -0.18
    妻
    -0.17
    ython
    -0.17
     Female
    -0.17
     female
    -0.16
     жен
    -0.16
     meisje
    -0.16
     himself
    -0.15
     guys
    -0.15
    POSITIVE LOGITS
    hood
    0.28
    folk
    0.24
    /man
    0.22
    ized
    0.22
    izing
    0.22
    izers
    0.22
    izer
    0.21
    zimmer
    0.21
     empowerment
    0.19
    /m
    0.17
    Act Density 0.057%

    No Known Activations