INDEX
    Explanations

    pronouns referring to males

    New Auto-Interp
    Negative Logits
    Doing
    0.50
     Doing
    0.46
    οδο
    0.46
    CNT
    0.40
     doing
    0.39
    oman
    0.39
    ทำ
    0.37
     làm
    0.37
    ώσεις
    0.37
    Doom
    0.36
    POSITIVE LOGITS
    adou
    0.43
     "__
    0.41
     Iwas
    0.39
    handled
    0.39
     Frey
    0.38
     बराबर
    0.38
    0.38
     Fred
    0.37
     fing
    0.37
    0.37
    Act Density 0.000%

    No Known Activations