INDEX
Explanations
pronouns related to a male subject
references to male characters in various contexts
New Auto-Interp
Negative Logits
CNN
-0.77
Bundes
-0.73
Mae
-0.71
nb
-0.65
Temperature
-0.64
sovere
-0.64
Period
-0.64
mary
-0.62
Thatcher
-0.62
WN
-0.62
POSITIVE LOGITS
penis
0.86
semen
0.84
handsome
0.83
ejac
0.81
dick
0.80
circumcised
0.80
sperm
0.77
avier
0.76
cock
0.75
don
0.75
Activations Density 0.269%