INDEX
    Explanations

    words related to violence or intense negative experiences

    references to violent or distressing imagery

    New Auto-Interp
    Negative Logits
    PLIED
    -0.88
    Reviewer
    -0.88
    anol
    -0.87
    Demand
    -0.86
    Recommend
    -0.78
    Rate
    -0.77
    rador
    -0.76
    BOOK
    -0.75
    later
    -0.75
    CHAT
    -0.75
    POSITIVE LOGITS
     bloody
    0.93
     noses
    0.83
     wounds
    0.77
     bast
    0.77
     slaughter
    0.74
     swath
    0.74
     prick
    0.72
    stained
    0.72
     swat
    0.72
     blood
    0.71
    Act Density 0.011%

    No Known Activations