INDEX
    Explanations

    content related to classification ratings and appropriateness for young audiences

    New Auto-Interp
    Negative Logits
    endi
    -0.15
    è°±
    -0.14
    ropol
    -0.14
    plits
    -0.14
    ÑĥÑħ
    -0.14
    inand
    -0.14
    itra
    -0.14
    ragen
    -0.13
     Roch
    -0.13
    ortex
    -0.13
    POSITIVE LOGITS
     violence
    0.31
     Violence
    0.28
    Viol
    0.25
    viol
    0.24
     violent
    0.23
     content
    0.22
     Viol
    0.22
     adult
    0.21
    -viol
    0.21
     viol
    0.20
    Act Density 0.190%

    No Known Activations