INDEX
Explanations
words related to tobacco
references to tobacco and nicotine
New Auto-Interp
Negative Logits
unity
-0.77
{\-0.72
Cry
-0.70
Photos
-0.69
amorph
-0.69
Kal
-0.68
ensional
-0.68
plex
-0.68
Syl
-0.67
Shards
-0.67
POSITIVE LOGITS
tobacco
3.73
Tobacco
3.08
cigarette
2.16
nicotine
2.13
cigarettes
2.11
smoking
1.95
tob
1.92
smokers
1.82
Nicotine
1.82
smoking
1.81
Activations Density 0.012%