INDEX
    Explanations

    pronouns associated with male individuals

    New Auto-Interp
    Negative Logits
    iciel
    -0.15
    umper
    -0.15
    -webpack
    -0.15
     ZemÄĽ
    -0.14
     è»
    -0.14
    WEEN
    -0.14
     æĿ
    -0.14
    \base
    -0.13
    otec
    -0.13
    deniz
    -0.13
    POSITIVE LOGITS
    /her
    0.20
     or
    0.18
    /she
    0.17
    .her
    0.16
    idi
    0.15
    éľ
    0.14
     gol
    0.14
    ãĤ·ãĥ£
    0.14
     Richards
    0.14
    123
    0.14
    Act Density 0.111%

    No Known Activations