INDEX
Explanations
pronouns referring to a male individual along with actions or characteristics associated with him
references to male individuals in various contexts
New Auto-Interp
Negative Logits
Bundes
-0.79
nb
-0.73
CNN
-0.69
Counter
-0.68
counter
-0.67
carb
-0.66
Mae
-0.65
Period
-0.64
Disk
-0.63
Actress
-0.62
POSITIVE LOGITS
handsome
0.86
semen
0.84
penis
0.81
ejac
0.80
cock
0.79
sperm
0.78
dick
0.77
avier
0.71
ctor
0.71
ading
0.70
Activations Density 0.274%