INDEX
    Explanations

    mentions of gendered terms, specifically related to boys and girls

    New Auto-Interp
    Negative Logits
     mergeFrom
    -0.48
     EconPapers
    -0.46
     linkovi
    -0.44
    Personensuche
    -0.43
    Autowired
    -0.43
    </thead>
    -0.41
    när
    -0.41
    UnknownFields
    -0.40
    BeginInit
    -0.38
     unknownFields
    -0.38
    POSITIVE LOGITS
     scout
    0.82
     scouts
    0.79
     Girl
    0.78
    Girl
    0.77
    Boy
    0.77
    boy
    0.74
    girl
    0.73
     Scouts
    0.72
     Scout
    0.68
     Boy
    0.68
    Act Density 0.092%

    No Known Activations