INDEX
    Explanations

    references to women and related issues within societal or contextual discussions

    New Auto-Interp
    Negative Logits
     Furn
    -0.14
    klass
    -0.14
    etto
    -0.14
    ght
    -0.14
    erk
    -0.14
    atan
    -0.14
    URED
    -0.14
    IColor
    -0.13
     Repository
    -0.13
     tail
    -0.13
    POSITIVE LOGITS
     Zem
    0.16
     Kee
    0.16
    ãĥ¼ãĥł
    0.16
    .Aggressive
    0.15
     forces
    0.15
    izard
    0.15
    orges
    0.15
    orce
    0.14
     natural
    0.14
    uder
    0.14
    Act Density 0.016%

    No Known Activations