INDEX
    Explanations

    references to gendered terms related to boys and girls

    Appears before "George," "scout," or gendered terms

    New Auto-Interp
    Negative Logits
     InputDecoration
    -0.86
     للمعارف
    -0.75
    UnsafeEnabled
    -0.72
    GraphicsUnit
    -0.66
    )}</
    -0.64
    asgi
    -0.63
    IBOutlet
    -0.63
    SBATCH
    -0.62
     كومونز
    -0.62
    InputBorder
    -0.61
    POSITIVE LOGITS
     scout
    0.90
     scouts
    0.90
     GIRLS
    0.84
     girls
    0.83
     Scouts
    0.79
     Girls
    0.78
    FRIEND
    0.77
     girl
    0.76
    Girls
    0.76
    hood
    0.75
    Act Density 0.062%

    No Known Activations