INDEX
Explanations
references to cigarettes and related terms
mentions of cigarettes and their related products
New Auto-Interp
Negative Logits
*/(
-0.78
herty
-0.77
isite
-0.73
pmwiki
-0.72
oral
-0.70
icter
-0.70
yss
-0.70
UFC
-0.69
variable
-0.67
ede
-0.67
POSITIVE LOGITS
arette
1.34
arettes
1.24
cigarettes
1.19
smoking
1.13
smokers
1.10
cigarette
1.08
smoker
1.08
cig
1.06
smoked
0.97
cigarettes
0.95
Activations Density 0.009%