INDEX
    Explanations

    references to men and male-related topics

    New Auto-Interp
    Negative Logits
    Sad
    -0.19
     Sad
    -0.17
    ters
    -0.16
    eren
    -0.15
    ahn
    -0.15
    atik
    -0.15
    aters
    -0.15
    ernote
    -0.14
    uil
    -0.14
    tee
    -0.14
    POSITIVE LOGITS
    opause
    0.23
    cken
    0.21
    endez
    0.20
    orca
    0.20
    ubar
    0.20
    kes
    0.20
    iscal
    0.19
    едж
    0.19
    isci
    0.19
    acing
    0.19
    Act Density 0.016%

    No Known Activations