INDEX
    Explanations

    references to male individuals and their roles in various contexts

    New Auto-Interp
    Negative Logits
    dale
    -0.18
    lu
    -0.18
    ally
    -0.18
    sale
    -0.18
    naire
    -0.17
    lauf
    -0.16
    ale
    -0.16
    rial
    -0.15
    ese
    -0.15
    rie
    -0.15
    POSITIVE LOGITS
    volent
    0.26
    -dominated
    0.18
    itarian
    0.18
    factor
    0.17
    ÅŁtir
    0.17
    ynes
    0.15
    cul
    0.15
    æ´²
    0.15
    utdown
    0.15
    quota
    0.14
    Act Density 0.024%

    No Known Activations