INDEX
Explanations
phrases expressing societal issues and disparities
references to gender dynamics and societal perceptions related to men and women
New Auto-Interp
Negative Logits
iHUD
-0.75
Cooldown
-0.71
Closure
-0.69
IB
-0.67
pleted
-0.66
ONSORED
-0.65
iator
-0.64
eur
-0.63
Chain
-0.63
INAL
-0.62
POSITIVE LOGITS
disproportionately
1.03
outnumbered
1.00
happiest
0.92
themselves
0.92
enslaved
0.85
discriminated
0.82
overwhelmingly
0.80
oppressed
0.79
biologically
0.79
joice
0.78
Activations Density 0.582%