INDEX
    Explanations

    phrases indicating the presence or involvement of women in various contexts

    New Auto-Interp
    Negative Logits
     purpoſe
    -0.94
     pleaſure
    -0.89
     juſ
    -0.81
     ſever
    -0.80
     myſelf
    -0.79
     feroit
    -0.77
     uſe
    -0.77
     ſtate
    -0.77
     uſed
    -0.75
     ſta
    -0.74
    POSITIVE LOGITS
    s
    1.37
     s
    0.69
    {~
    0.60
     own
    0.59
    𝑠
    0.55
    ils
    0.54
    Ys
    0.52
    etts
    0.52
    mens
    0.52
    ds
    0.51
    Act Density 0.248%

    No Known Activations