INDEX
    Explanations

    phrases related to physical aggression and physical interactions

    instances of physical aggression or violence

    New Auto-Interp
    Negative Logits
     Printed
    -0.71
     Ended
    -0.63
    ories
    -0.63
     WRITE
    -0.62
     substituted
    -0.62
    Reviewed
    -0.62
    ifact
    -0.60
    ieties
    -0.59
    Po
    -0.58
    inished
    -0.58
    POSITIVE LOGITS
     whom
    0.73
    adem
    0.73
     affection
    0.70
    illac
    0.67
     dur
    0.66
    assad
    0.65
    oba
    0.65
     advoc
    0.65
     anat
    0.62
    ubes
    0.62
    Act Density 0.777%

    No Known Activations