INDEX
    Explanations

    references to female identities and gender dynamics

    New Auto-Interp
    Negative Logits
    nya
    -0.15
    ese
    -0.15
    ary
    -0.15
    meld
    -0.15
    lu
    -0.15
    ning
    -0.15
    nel
    -0.14
    ally
    -0.14
    sel
    -0.14
    rig
    -0.14
    POSITIVE LOGITS
    itarian
    0.19
    volent
    0.18
    æ´²
    0.18
    åĪ¥
    0.18
    factor
    0.17
    erre
    0.16
    .Flag
    0.15
    Outlined
    0.14
    åĪ«
    0.14
     hoÃłng
    0.14
    Act Density 0.015%

    No Known Activations