INDEX
    Explanations

    violence-related phrases involving physical harm and law enforcement

    references to violent incidents or fatalities

    New Auto-Interp
    Negative Logits
    retty
    -0.55
    awaru
    -0.52
    udos
    -0.51
     Vaugh
    -0.50
     conclud
    -0.50
    soDeliveryDate
    -0.49
     incorpor
    -0.49
     furthermore
    -0.49
     however
    -0.48
     moreover
    -0.48
    POSITIVE LOGITS
    )?
    0.64
    ?",
    0.59
     \'
    0.53
    apor
    0.52
     ])
    0.52
    )|
    0.51
    Ħ¢
    0.50
    )]
    0.49
    their
    0.49
    ?),
    0.49
    Act Density 2.041%

    No Known Activations