INDEX
Explanations
mentions of gender disparities or differences, particularly focusing on boys and girls
mentions of boys and girls, focusing on gender-based distinctions
New Auto-Interp
Negative Logits
arily
-0.78
osi
-0.77
iture
-0.71
itures
-0.70
reimbursement
-0.66
ãĥķãĤ©
-0.65
lyak
-0.64
olia
-0.64
iped
-0.63
ibrary
-0.63
POSITIVE LOGITS
ages
1.21
aged
1.05
puberty
1.05
boys
0.98
Scouts
0.97
friend
0.91
boys
0.91
scouts
0.89
hift
0.88
scout
0.86
Activations Density 0.055%