INDEX
    Explanations

    references to male titles or honorifics

    New Auto-Interp
    Negative Logits
    ondo
    -0.15
    istrov
    -0.15
    adh
    -0.15
    ýš
    -0.14
    IDDEN
    -0.14
    rades
    -0.14
    iper
    -0.14
    οÏħλ
    -0.14
    oa
    -0.14
    ãĤ¤ãĤ¯
    -0.14
    POSITIVE LOGITS
    ships
    0.22
    ship
    0.21
    innen
    0.16
    zek
    0.15
    urb
    0.15
    üh
    0.15
    ified
    0.15
     ApplicationException
    0.14
    esses
    0.14
    ekyll
    0.14
    Act Density 0.156%

    No Known Activations