INDEX
    Explanations

    references to male characters or entities

    New Auto-Interp
    Negative Logits
     comod
    -0.69
     auc
    -0.67
     aspec
    -0.65
     Pria
    -0.65
    Źródło
    -0.65
     Atas
    -0.65
     mín
    -0.64
     aig
    -0.63
     Monfieur
    -0.63
     Inscrivez
    -0.62
    POSITIVE LOGITS
     he
    2.00
     He
    1.75
    He
    1.69
     she
    1.65
     himself
    1.45
    she
    1.38
    She
    1.37
     his
    1.33
    himself
    1.30
     She
    1.29
    Act Density 0.249%

    No Known Activations