INDEX
    Explanations

    content with explicit or graphic content

    references to social and cultural commentary, often with informal and humorous undertones

    New Auto-Interp
    Negative Logits
    isSpecialOrderable
    -0.73
    asury
    -0.69
    Reward
    -0.67
     safegu
    -0.66
    Import
    -0.66
    åĬ
    -0.66
     qualitative
    -0.65
    Clear
    -0.65
    ¿½
    -0.65
     Regulatory
    -0.63
    POSITIVE LOGITS
     lol
    1.41
     haha
    1.39
     ;)
    1.31
     LOL
    1.31
    ?!
    1.30
    !!!!
    1.27
    ???
    1.23
    !!!!!
    1.23
    !!!
    1.22
     shit
    1.22
    Act Density 0.618%

    No Known Activations