INDEX
Explanations
male and female demographics
New Auto-Interp
Negative Logits
during
-0.95
没有
-0.84
when
-0.84
prevent
-0.82
....
-0.80
one
-0.79
drugs
-0.78
.....
-0.78
čilo
-0.78
親
-0.77
POSITIVE LOGITS
female
2.89
females
2.80
Female
2.30
female
2.27
Female
2.06
Females
2.03
women
2.02
male
1.90
ladies
1.80
girls
1.77
Activations Density 0.043%