INDEX
    Explanations

    male reference (he/his)

    New Auto-Interp
    Negative Logits
     shell
    -0.07
    _SYSTEM
    -0.06
    INVALID
    -0.06
    >.↵↵
    -0.06
    Something
    -0.06
     SMART
    -0.06
     surfaces
    -0.06
    Moon
    -0.06
     followers
    -0.06
     cultivate
    -0.06
    POSITIVE LOGITS
     policemen
    0.08
     bölge
    0.07
    bound
    0.07
    omas
    0.07
     his
    0.06
    -man
    0.06
    CKET
    0.06
     maxi
    0.06
     competency
    0.06
     karş
    0.06
    Act Density 0.193%

    No Known Activations