INDEX
Explanations
references to family dynamics and household interactions
New Auto-Interp
Negative Logits
licit
-0.16
raya
-0.16
yat
-0.14
ayd
-0.14
isons
-0.14
addCriterion
-0.14
lateral
-0.14
orthand
-0.13
Disqus
-0.13
ÙħÙĨت
-0.13
POSITIVE LOGITS
extreme
0.57
ends
0.54
extrem
0.50
extremes
0.49
end
0.47
Extreme
0.44
opposite
0.42
Ends
0.41
ends
0.40
Extreme
0.40
Activations Density 0.141%