INDEX
Explanations
words related to a specific male group or category
references to male individuals or groups
New Auto-Interp
Negative Logits
VICE
-0.72
Dog
-0.71
BILL
-0.67
kernels
-0.67
Pwr
-0.65
IVERS
-0.65
ENC
-0.64
EAR
-0.62
Ear
-0.62
INO
-0.61
POSITIVE LOGITS
opausal
1.12
gling
1.10
uscript
1.02
endez
0.95
volent
0.94
士
0.92
thren
0.92
hood
0.92
orah
0.90
istan
0.87
Activations Density 0.016%