INDEX
Explanations
gender-related terms and references
New Auto-Interp
Negative Logits
IVEREF
-0.81
EDEFAULT
-0.79
للاسماء
-0.79
Попис
-0.78
Plin
-0.76
})));
-0.74
صوتيه
-0.74
FormTagHelper
-0.72
='')
-0.71
}\]
-0.70
POSITIVE LOGITS
gender
0.67
Gender
0.58
Gender
0.56
volent
0.51
gender
0.51
Male
0.49
ген
0.48
Men
0.47
Male
0.45
ogyn
0.44
Activations Density 0.191%