INDEX
    Explanations

    phrases related to scams

    instances of the word "spam."

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.68
     Vernon
    -0.67
     dimensions
    -0.64
     tremend
    -0.63
     Guinness
    -0.62
     spirits
    -0.62
     Sketch
    -0.60
     presence
    -0.59
    é»Ĵ
    -0.59
     Spirits
    -0.59
    POSITIVE LOGITS
    pling
    1.13
    sterdam
    1.04
    nesty
    1.03
    ilial
    1.00
    ilies
    0.99
    ulet
    0.95
    azing
    0.95
    bitious
    0.94
    utation
    0.93
    essage
    0.93
    Act Density 0.016%

    No Known Activations