INDEX
Explanations
mentions of the word "tobacco"
references to tobacco and its associated health impacts
New Auto-Interp
Negative Logits
ithmetic
-0.76
Wynne
-0.75
Lazarus
-0.70
infeld
-0.69
Zup
-0.68
Gamergate
-0.68
Ake
-0.68
Defenders
-0.68
variable
-0.68
olon
-0.67
POSITIVE LOGITS
cigarettes
1.11
smoking
0.98
smoke
0.96
cessation
0.93
cigarette
0.91
smoked
0.85
cig
0.84
tobacco
0.84
arette
0.82
puff
0.81
Activations Density 0.012%