INDEX
Explanations
comparisons or differences involving health outcomes or socioeconomic factors
comparative phrases indicating disparities or differences among groups
New Auto-Interp
Negative Logits
ema
-0.70
Animation
-0.67
huge
-0.67
iHUD
-0.67
Hopefully
-0.65
atis
-0.64
¯
-0.64
Hack
-0.63
Feeling
-0.63
ãĤ¡
-0.63
POSITIVE LOGITS
counterparts
1.16
those
1.13
controls
1.05
comparable
1.04
non
1.02
unex
1.02
untreated
1.02
unin
1.00
others
1.00
nont
1.00
Activations Density 0.145%