INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    smoking
    -0.95
     Smoking
    -0.91
     Smo
    -0.91
     smoking
    -0.91
     smoker
    -0.90
    Smo
    -0.87
     smokes
    -0.85
    Smoking
    -0.84
     smokers
    -0.81
     smoke
    -0.78
    POSITIVE LOGITS
    '])->
    0.59
    })->
    0.55
     مرئيه
    0.53
    ]');
    0.48
    पया
    0.47
    ')->
    0.47
    Gweler
    0.47
    .
    0.46
    ]').
    0.45
    ")->
    0.43
    Act Density 0.020%

    No Known Activations