INDEX
    Explanations

    phrases with pronouns indicating specific individuals

    pronouns, particularly those referring to male individuals

    New Auto-Interp
    Negative Logits
     Fail
    -0.65
    fail
    -0.59
     Killing
    -0.58
     Underworld
    -0.58
     Anarchy
    -0.57
    Row
    -0.55
     Description
    -0.55
    Chem
    -0.55
    metal
    -0.55
     Destruction
    -0.55
    POSITIVE LOGITS
     expects
    1.39
     understands
    1.29
     believes
    1.28
     intends
    1.25
     regretted
    1.21
     thinks
    1.20
     disagrees
    1.20
     hoped
    1.19
     hopes
    1.18
     regrets
    1.18
    Act Density 0.125%

    No Known Activations