INDEX
    Explanations

    references to women and their roles or identities

    New Auto-Interp
    Negative Logits
     himself
    -1.01
     Himself
    -0.85
    himself
    -0.78
     који
    -0.78
     koji
    -0.76
    AddTagHelper
    -0.66
    __':
    
    -0.63
     Jr
    -0.61
    __":
    
    -0.61
     boyhood
    -0.61
    POSITIVE LOGITS
     herself
    1.62
    herself
    1.18
     bint
    0.92
     she
    0.91
     her
    0.83
     ihrem
    0.83
     actress
    0.82
     shes
    0.81
    حياتها
    0.79
     ihren
    0.78
    Act Density 1.492%

    No Known Activations