INDEX
Explanations
gender-specific mentions of boys and girls
mentions of gendered terms related to boys and girls
New Auto-Interp
Negative Logits
lich
-0.77
Hulk
-0.72
sole
-0.71
uncture
-0.71
eering
-0.71
Root
-0.70
ointment
-0.70
rehend
-0.67
rella
-0.66
emort
-0.66
POSITIVE LOGITS
hift
1.01
ages
0.88
ieve
0.84
emen
0.83
mith
0.78
folk
0.76
Clubs
0.75
pace
0.75
boys
0.72
clubs
0.72
Activations Density 0.038%