INDEX
Explanations
the word "cigarette" at different points in the text
references to cigarettes and smoking
New Auto-Interp
Negative Logits
*/(
-0.85
pmwiki
-0.76
isite
-0.74
ede
-0.72
alon
-0.71
herty
-0.67
UFC
-0.66
eele
-0.66
Yates
-0.66
ology
-0.65
POSITIVE LOGITS
arette
1.30
arettes
1.26
cigarettes
1.22
cigarette
1.12
cig
1.11
smoking
1.11
smokers
1.10
smoker
1.09
smoked
0.97
puff
0.96
Activations Density 0.009%