INDEX
    Explanations

    gender-related terms and references

    New Auto-Interp
    Negative Logits
    IVEREF
    -0.81
    EDEFAULT
    -0.79
     للاسماء
    -0.79
    Попис
    -0.78
     Plin
    -0.76
    })));
    -0.74
     صوتيه
    -0.74
    FormTagHelper
    -0.72
    ='')
    -0.71
    }\]
    -0.70
    POSITIVE LOGITS
     gender
    0.67
     Gender
    0.58
    Gender
    0.56
    volent
    0.51
    gender
    0.51
     Male
    0.49
     ген
    0.48
     Men
    0.47
    Male
    0.45
    ogyn
    0.44
    Act Density 0.191%

    No Known Activations