INDEX
    Explanations

    mentions of boys or male characters in various contexts

    New Auto-Interp
    Negative Logits
     InputDecoration
    -0.89
    ″]
    -0.80
    >")
    -0.79
    IsContent
    -0.79
    ‬‬
    -0.78
    AndEndTag
    -0.78
     PMC
    -0.74
     للمعارف
    -0.74
    "]}
    -0.74
    ')}
    -0.74
    POSITIVE LOGITS
     boys
    1.88
     Boys
    1.83
     boy
    1.83
     Boy
    1.83
    Boy
    1.82
     BOY
    1.78
     BOYS
    1.76
    boy
    1.75
    Boys
    1.67
    boys
    1.64
    Act Density 0.032%

    No Known Activations