INDEX
    Explanations

    instances of the term "spam" in relation to unwanted or irrelevant content

    words related to fraudulent activities

    New Auto-Interp
    Negative Logits
    ONSORED
    -0.67
     kinderg
    -0.66
    WIND
    -0.66
     anticipation
    -0.61
    yi
    -0.61
     tremend
    -0.61
     spirits
    -0.61
     unders
    -0.59
    REDACTED
    -0.58
     cooled
    -0.58
    POSITIVE LOGITS
    pling
    1.16
    nesty
    1.11
    ilies
    1.04
    sterdam
    1.03
    elia
    1.01
    pering
    1.01
    ilial
    1.00
    ilar
    0.99
    amia
    0.97
    essage
    0.95
    Act Density 0.021%

    No Known Activations