INDEX
    Explanations

    mentions of aggressive behavior or aggression-related terms

    terms related to aggressive behaviors and violence

    New Auto-Interp
    Negative Logits
    FORMATION
    -0.76
    zl
    -0.74
    verend
    -0.74
    HCR
    -0.70
    obook
    -0.68
    lev
    -0.67
     Bake
    -0.67
    ummer
    -0.66
    haul
    -0.65
    aver
    -0.65
    POSITIVE LOGITS
     aggression
    0.89
     against
    0.85
     aggress
    0.82
     towards
    0.82
     toward
    0.79
     posture
    0.78
    iveness
    0.77
     escalation
    0.76
     provocation
    0.76
    Agg
    0.73
    Act Density 0.053%

    No Known Activations