INDEX
Negative Logits
heterosexual
-0.09
vestib
-0.08
Steam
-0.08
'~
-0.08
leagues
-0.08
gay
-0.08
runt
-0.08
escorts
-0.08
semif
-0.08
"~
-0.08
POSITIVE LOGITS
coefficients
0.11
coeff
0.11
Polynomial
0.11
polynomial
0.10
Polynomial
0.10
Fourier
0.10
Coe
0.09
_coeff
0.09
vanish
0.09
coeff
0.09
Activations Density 0.034%