INDEX
    Explanations

    references to violent actions and military conflicts

    New Auto-Interp
    Negative Logits
    ArgsConstructor
    -0.64
    ologues
    -0.55
     FontWeight
    -0.54
    יצוני
    -0.54
    Fprintf
    -0.53
    strerror
    -0.52
     kasarigan
    -0.50
     alapján
    -0.47
     ää
    -0.47
     tác
    -0.47
    POSITIVE LOGITS
     unsuspecting
    1.41
     innocent
    1.36
     defen
    1.30
    innocent
    1.12
     inocente
    1.07
     helpless
    1.04
     innoc
    0.99
     unarmed
    0.98
     targets
    0.97
     hapless
    0.93
    Act Density 0.502%

    No Known Activations