INDEX
Explanations
references to gender differences, specifically focusing on boys
repeated references to boys
New Auto-Interp
Negative Logits
itures
-0.79
lyak
-0.77
iture
-0.75
arily
-0.73
Solitaire
-0.70
acle
-0.70
Las
-0.68
igation
-0.68
office
-0.67
iped
-0.67
POSITIVE LOGITS
ages
1.02
boys
1.01
Scouts
0.99
friend
0.93
puberty
0.91
volent
0.91
boys
0.90
hood
0.86
scout
0.81
bands
0.80
Activations Density 0.041%