INDEX
    Explanations

    references to female and male characters in the text

    pronouns describing people

    New Auto-Interp
    Negative Logits
    RegressionTest
    -0.44
     propOrder
    -0.41
    気がする
    -0.40
    })()
    -0.39
    kwds
    -0.38
     confusion
    -0.37
     Alford
    -0.37
     falsche
    -0.37
    IntoConstraints
    -0.37
     logging
    -0.36
    POSITIVE LOGITS
    ftagPool
    0.52
    новништво
    0.52
     virkelig
    0.51
    Geplaatst
    0.49
    fjspx
    0.47
    adpleegd
    0.47
    Билгалдахарш
    0.47
    zydent
    0.47
     astore
    0.46
     nahilalakip
    0.43
    Act Density 0.045%

    No Known Activations