INDEX
Explanations
words related to tobacco and nicotine
terms related to tobacco and its effects
New Auto-Interp
Negative Logits
Wynne
-0.76
Ake
-0.72
Defenders
-0.70
Lazarus
-0.69
infeld
-0.69
ithmetic
-0.68
Gamergate
-0.68
imov
-0.65
inosaur
-0.65
olon
-0.65
POSITIVE LOGITS
cigarettes
1.09
smoking
0.99
smoke
0.97
cessation
0.95
cigarette
0.93
arette
0.86
smoked
0.85
tobacco
0.85
puff
0.85
cigarette
0.82
Activations Density 0.010%