INDEX
    Explanations

    references to violence and aggression in various contexts

    New Auto-Interp
    Negative Logits
    Fprintf
    -0.58
    ArgsConstructor
    -0.57
    יצוני
    -0.50
    strerror
    -0.47
    clic
    -0.44
    řád
    -0.43
    ylus
    -0.43
    OfDay
    -0.43
    -0.42
    <_>
    -0.42
    POSITIVE LOGITS
     unsuspecting
    1.34
     innocent
    1.18
     defen
    1.09
    innocent
    1.00
     targets
    0.98
     inocente
    0.95
     vulnerable
    0.94
     helpless
    0.91
     unprotected
    0.91
     innoc
    0.89
    Act Density 0.593%

    No Known Activations