INDEX
Explanations
references to women and their roles or identities
New Auto-Interp
Negative Logits
himself
-1.01
Himself
-0.85
himself
-0.78
који
-0.78
koji
-0.76
AddTagHelper
-0.66
__':
-0.63
Jr
-0.61
__":
-0.61
boyhood
-0.61
POSITIVE LOGITS
herself
1.62
herself
1.18
bint
0.92
she
0.91
her
0.83
ihrem
0.83
actress
0.82
shes
0.81
حياتها
0.79
ihren
0.78
Activations Density 1.492%