INDEX
    Explanations

    references to boys and girls in the text

    New Auto-Interp
    Negative Logits
    BeginInit
    -0.76
     Genn
    -0.75
    >")
    -0.74
     "").
    -0.74
    \}}
    -0.74
    ″]
    -0.73
    ")}
    -0.71
    ]").
    -0.70
    AndEndTag
    -0.69
    "=>"
    -0.69
    POSITIVE LOGITS
     Boys
    2.12
     boys
    2.12
     boy
    2.09
     BOYS
    2.06
     BOY
    2.00
    Boy
    2.00
    Boys
    1.98
     Boy
    1.96
    boy
    1.95
    boys
    1.92
    Act Density 0.034%

    No Known Activations