INDEX
Explanations
words related to health, particularly focusing on identifying unhealthy aspects or choices
references to health and dietary quality, specifically focusing on unhealthy and healthier food options
New Auto-Interp
Negative Logits
acqu
-0.70
arium
-0.69
borrowed
-0.67
Trials
-0.64
swoop
-0.64
Blade
-0.64
translation
-0.63
translated
-0.62
runners
-0.61
plun
-0.61
POSITIVE LOGITS
iterranean
0.90
healthy
0.85
Eating
0.84
unhealthy
0.80
healthier
0.78
isot
0.77
Anthrop
0.76
lihood
0.74
igmatic
0.74
habits
0.73
Activations Density 0.011%