INDEX
Explanations
references to male individuals and their characteristics
New Auto-Interp
Negative Logits
guys
-0.23
Guys
-0.20
boys
-0.20
Sisters
-0.18
Boys
-0.18
ners
-0.17
men
-0.17
males
-0.17
sel
-0.16
dh
-0.16
POSITIVE LOGITS
volent
0.43
fic
0.28
factor
0.26
-dominated
0.26
/f
0.25
uada
0.23
vol
0.21
bonding
0.21
faction
0.20
strom
0.18
Activations Density 0.019%