INDEX
    Explanations

    explicitly censored profanity

    explicit language and strong profanity

    New Auto-Interp
    Negative Logits
     mosqu
    -0.69
     conduc
    -0.68
     elig
    -0.66
     isolation
    -0.66
    Buyable
    -0.63
     waivers
    -0.61
     Agric
    -0.61
    VB
    -0.60
     Annotations
    -0.58
     unsupported
    -0.58
    POSITIVE LOGITS
    cking
    1.19
    king
    1.15
    kers
    1.13
    ked
    1.13
    tty
    1.06
    gger
    1.04
    shit
    1.03
    tch
    1.02
    ker
    0.98
    k
    0.97
    Act Density 0.047%

    No Known Activations