INDEX
    Explanations

    references to drugging or related terms

    New Auto-Interp
    Negative Logits
    ray
    -0.79
     coloured
    -0.76
    road
    -0.69
     Pastebin
    -0.68
    rir
    -0.68
    ution
    -0.68
     Sabha
    -0.68
    rique
    -0.67
    aceutical
    -0.66
    pite
    -0.65
    POSITIVE LOGITS
     dru
    1.14
     squat
    0.70
     scaling
    0.70
    advertising
    0.68
     dup
    0.68
     raping
    0.63
     doub
    0.63
     submar
    0.62
     elig
    0.62
     dumping
    0.61
    Act Density 0.001%

    No Known Activations