INDEX
Explanations
pronouns and male names in contexts indicating personal relationships
references to male characters
New Auto-Interp
Negative Logits
Profit
-0.68
é¾
-0.66
Seller
-0.65
iciency
-0.64
nm
-0.62
Smy
-0.62
Springs
-0.61
Delicious
-0.61
maxwell
-0.60
CNN
-0.60
POSITIVE LOGITS
panic
0.81
aternity
0.76
avier
0.75
antagonist
0.73
Į
0.69
alin
0.68
aps
0.67
guardians
0.67
anatomy
0.66
otomy
0.65
Activations Density 0.243%