INDEX
    Explanations

    say pronouns referring to a male

    New Auto-Interp
    Negative Logits
     immun
    -0.07
    em
    -0.07
     Toby
    -0.07
     Seller
    -0.07
     Teens
    -0.06
     Mills
    -0.06
    دو
    -0.06
     Vari
    -0.06
    --+
    -0.06
     Nielsen
    -0.06
    POSITIVE LOGITS
    iselect
    0.07
    _QUESTION
    0.07
    年の
    0.06
     Εκ
    0.06
     >
    ↵
    0.06
    complexContent
    0.06
    uccess
    0.06
     이야
    0.06
    عنی
    0.06
    -Language
    0.06
    Act Density 0.070%

    No Known Activations