INDEX
    Explanations

    sexist, misogynistic, or blaming women

    New Auto-Interp
    Negative Logits
     રહ્યો
    0.98
     ગયો
    0.88
     אתה
    0.87
     நண்ப
    0.85
     метр
    0.81
     раствора
    0.75
    0.73
    તો
    0.73
     CONFIG
    0.71
     jego
    0.70
    POSITIVE LOGITS
     women
    4.22
     female
    4.21
     feminist
    4.07
     feminine
    3.92
     feminism
    3.85
     여성
    3.83
     Women
    3.82
    Women
    3.79
     feminists
    3.76
     femininity
    3.75
    Act Density 0.709%

    No Known Activations