INDEX
    Explanations

    mentions of smoking-related terms

    references to smoking and its effects

    New Auto-Interp
    Negative Logits
    assian
    -0.81
    tell
    -0.70
     Defenders
    -0.69
    translation
    -0.69
     Vector
    -0.67
    telling
    -0.66
    Philipp
    -0.65
     Nou
    -0.65
    ousse
    -0.65
    HCR
    -0.64
    POSITIVE LOGITS
     cessation
    1.31
     smoking
    1.19
     smoked
    1.08
     smoker
    1.08
     cigarettes
    1.07
     smoke
    1.03
     cigars
    1.02
     habits
    0.96
     tobacco
    0.95
     smokers
    0.93
    Act Density 0.015%

    No Known Activations