INDEX
Explanations
references to gender-specific terms related to boys and girls
New Auto-Interp
Negative Logits
Berne
-0.42
()]
-0.37
UnknownFields
-0.37
});
-0.37
Diane
-0.36
cerely
-0.36
insegna
-0.35
näm
-0.35
Diane
-0.34
Cane
-0.34
POSITIVE LOGITS
girls
0.75
Girls
0.73
Girls
0.71
Girl
0.69
boys
0.67
Boy
0.66
girls
0.66
Girl
0.66
girl
0.64
girl
0.64
Activations Density 0.072%