INDEX
    Explanations

    references to violence and threats to safety

    Attacking, targeting, or harming others

    violence against innocents

    New Auto-Interp
    Negative Logits
    ArgsConstructor
    -0.62
    Fprintf
    -0.54
    יצוני
    -0.53
    strerror
    -0.48
     igény
    -0.46
    toHave
    -0.46
     tác
    -0.45
     FontWeight
    -0.45
     számára
    -0.44
     gestos
    -0.44
    POSITIVE LOGITS
     unsuspecting
    1.35
     innocent
    1.20
     defen
    1.07
    innocent
    1.00
     inocente
    0.94
     unprotected
    0.91
     indiscrimin
    0.90
     helpless
    0.90
     innoc
    0.88
     hapless
    0.87
    Act Density 0.488%

    No Known Activations