INDEX
    Explanations

    references to male individuals, particularly using the word "guy."

    New Auto-Interp
    Negative Logits
     Eber
    -0.73
    ———-
    -0.69
     AER
    -0.67
    -0.66
    =”
    -0.66
     Kear
    -0.64
    くると
    -0.63
     Beg
    -0.63
    d
    -0.63
    kso
    -0.62
    POSITIVE LOGITS
     guys
    1.75
    Guys
    1.75
    guys
    1.74
     GUYS
    1.70
     Guys
    1.70
     GUY
    1.61
     guy
    1.55
     Guy
    1.52
    guy
    1.43
    Guy
    1.43
    Act Density 0.058%

    No Known Activations