INDEX
    Explanations

    inappropriate content such as vulgar language

    instances of vulgar language

    New Auto-Interp
    Negative Logits
    Winged
    -0.73
    atform
    -0.70
    ulin
    -0.64
    eger
    -0.63
    aah
    -0.63
    umbledore
    -0.62
    DOC
    -0.61
    UL
    -0.61
    WIND
    -0.60
    ulation
    -0.60
    POSITIVE LOGITS
    folk
    0.77
    eric
    0.72
    lists
    0.71
     Strait
    0.69
    yang
    0.65
    pend
    0.64
     trader
    0.63
    roth
    0.62
     cousin
    0.62
    shit
    0.62
    Act Density 0.000%

    No Known Activations