INDEX
    Explanations

    references to female pronouns and possessive forms

    New Auto-Interp
    Negative Logits
    emin
    -0.15
    utin
    -0.15
    ilo
    -0.14
    inar
    -0.14
    ifer
    -0.14
     Pag
    -0.14
    ogui
    -0.14
    ingly
    -0.13
     Examiner
    -0.13
    apg
    -0.13
    POSITIVE LOGITS
     alike
    0.15
    _userid
    0.14
     Beaut
    0.14
    Montserrat
    0.13
    ulner
    0.13
     Vend
    0.13
    andbox
    0.13
    Lorem
    0.13
    æ´¥
    0.13
    ifold
    0.13
    Act Density 0.006%

    No Known Activations