INDEX
Explanations
pronouns followed by actions related to physical interactions or behaviors involving another person
pronouns and references to a male subject
New Auto-Interp
Negative Logits
Bundes
-0.74
nb
-0.72
Mae
-0.70
carb
-0.65
Marketable
-0.65
Counter
-0.64
utical
-0.63
counter
-0.63
CNN
-0.61
Fuel
-0.60
POSITIVE LOGITS
ctor
0.80
semen
0.79
ading
0.73
sing
0.72
ejac
0.71
penis
0.71
handsome
0.70
sperm
0.69
avier
0.68
circumcised
0.67
Activations Density 0.299%