INDEX
    Explanations

    references to blood or violence

    references to violent or graphic imagery

    New Auto-Interp
    Negative Logits
    PLIED
    -0.85
    Reviewer
    -0.81
    Demand
    -0.81
    BOOK
    -0.80
    rador
    -0.77
    Rate
    -0.76
    anol
    -0.74
    YL
    -0.73
    agall
    -0.73
    Recomm
    -0.70
    POSITIVE LOGITS
     bloody
    0.83
     noses
    0.82
     wounds
    0.76
    soever
    0.73
     streak
    0.72
    ãĥ£
    0.71
     bast
    0.71
    heart
    0.70
     diarrhea
    0.70
     swath
    0.69
    Act Density 0.019%

    No Known Activations