INDEX
Explanations
gender-related characteristics and attitudes
New Auto-Interp
Negative Logits
REDACTED
-0.83
Assembly
-0.74
REC
-0.72
IVERS
-0.71
Deal
-0.67
KEN
-0.66
RAY
-0.64
ENC
-0.63
ITS
-0.63
VICE
-0.62
POSITIVE LOGITS
volent
1.41
opausal
1.17
hood
1.05
ager
0.99
uscript
0.99
folk
0.95
agers
0.94
gling
0.91
stru
0.90
hunt
0.90
Activations Density 1.153%