INDEX
    Explanations

    references to women

    New Auto-Interp
    Negative Logits
    ypes
    -0.86
     Flavoring
    -0.83
    rador
    -0.78
    UFF
    -0.76
    ython
    -0.75
    ysical
    -0.74
    agascar
    -0.74
    umbn
    -0.73
    rss
    -0.73
    DIS
    -0.71
    POSITIVE LOGITS
    hood
    1.11
    izer
    1.01
    folk
    0.94
    pher
    0.85
    cule
    0.83
     woman
    0.82
     who
    0.81
    izers
    0.77
    uscript
    0.76
    comed
    0.75
    Act Density 0.041%

    No Known Activations