INDEX
Explanations
mentions of "health" and related terms
mention of health-related topics or issues
New Auto-Interp
Negative Logits
xual
-0.81
yip
-0.79
Helpful
-0.76
Reloaded
-0.75
arget
-0.73
Downing
-0.71
toe
-0.70
Duo
-0.70
Darkness
-0.69
selves
-0.68
POSITIVE LOGITS
care
1.22
care
1.21
Care
1.04
insurance
1.02
Care
1.01
iest
0.95
aceutical
0.92
amacare
0.91
insurer
0.90
insurers
0.90
Activations Density 0.031%