INDEX
Explanations
terms related to gender distribution and roles
New Auto-Interp
Negative Logits
BrowserModule
-0.76
Vidite
-0.69
Parigi
-0.69
المعيارى
-0.69
afficheront
-0.68
UnsafeEnabled
-0.67
الإنجليزية
-0.67
صوتيه
-0.66
atimes
-0.65
❋
-0.64
POSITIVE LOGITS
male
1.36
women
1.20
Male
1.15
Male
1.14
males
1.13
male
1.09
female
1.09
masculine
1.05
männ
1.05
Female
1.02
Activations Density 0.234%