INDEX
    Explanations

    content that violates community guidelines or contains harmful behavior

    New Auto-Interp
    Negative Logits
    TexParameter
    -0.14
    efon
    -0.14
     ìĭ¤í
    -0.14
    igos
    -0.14
     RedirectTo
    -0.14
    stery
    -0.14
    WithEmail
    -0.14
    iosper
    -0.14
    flix
    -0.13
    HEMA
    -0.13
    POSITIVE LOGITS
     offensive
    0.26
     inflammatory
    0.22
     def
    0.22
     objection
    0.22
     hate
    0.22
     lib
    0.21
     copyrighted
    0.21
    obj
    0.20
     Offensive
    0.20
     violent
    0.19
    Act Density 0.092%

    No Known Activations