INDEX
    Explanations

    warnings and disclaimers in texts

    warnings about explicit or graphic content in media

    New Auto-Interp
    Negative Logits
    inguished
    -0.71
    luaj
    -0.69
    Redditor
    -0.68
    sonian
    -0.67
    elligent
    -0.66
    Hon
    -0.66
    annie
    -0.65
     srfAttach
    -0.65
    SPONSORED
    -0.64
    itage
    -0.63
    POSITIVE LOGITS
     *)
    0.94
     spoilers
    0.91
     assumes
    0.84
    !]
    0.84
    OIL
    0.83
     ALWAYS
    0.83
     .)
    0.79
     formatting
    0.76
    )*
    0.76
     RAW
    0.76
    Act Density 0.346%

    No Known Activations