INDEX
Explanations
phrases related to increased risk of medical conditions or mortality
references to various health risks
New Auto-Interp
Negative Logits
elf
-0.78
Seasons
-0.75
Nap
-0.73
Bit
-0.72
ilver
-0.71
TERN
-0.69
eve
-0.68
FANTASY
-0.68
zeb
-0.67
ION
-0.67
POSITIVE LOGITS
risks
0.93
risk
0.92
proble
0.84
hazards
0.83
iest
0.82
horm
0.82
mortality
0.81
aversion
0.79
risk
0.79
taking
0.78
Activations Density 0.021%