INDEX
Explanations
mentions of smoking
instances of the word "smoke" in various forms and contexts
New Auto-Interp
Negative Logits
Lama
-0.64
MacArthur
-0.62
quo
-0.61
Xavier
-0.60
rants
-0.59
headaches
-0.58
thus
-0.57
Graphics
-0.57
Veterans
-0.56
unmarked
-0.56
POSITIVE LOGITS
iley
1.45
ooth
1.42
oky
1.31
ugg
1.28
okers
1.28
itten
1.25
oker
1.24
oking
1.23
okes
1.21
okin
1.21
Activations Density 0.013%