INDEX
    Explanations

    references to gender-specific terms related to boys and girls

    New Auto-Interp
    Negative Logits
     Berne
    -0.42
    ()]
    
    -0.37
    UnknownFields
    -0.37
    });
    
    
    -0.37
     Diane
    -0.36
    cerely
    -0.36
    insegna
    -0.35
     näm
    -0.35
    Diane
    -0.34
     Cane
    -0.34
    POSITIVE LOGITS
     girls
    0.75
    Girls
    0.73
     Girls
    0.71
    Girl
    0.69
    boys
    0.67
    Boy
    0.66
    girls
    0.66
     Girl
    0.66
    girl
    0.64
     girl
    0.64
    Act Density 0.072%

    No Known Activations