INDEX
    Explanations

    the word "attacks"

    terms related to aggressive actions or threats, specifically the word "attacks"

    occurrences of the word "attacks" and its variations in the text

    New Auto-Interp
    Negative Logits
     Combine
    -0.66
    OVA
    -0.66
    dit
    -0.65
     Vale
    -0.63
    YC
    -0.62
     Harmon
    -0.62
    theless
    -0.61
     Hemp
    -0.60
     Ment
    -0.59
     Mole
    -0.59
    POSITIVE LOGITS
     attacks
    1.08
    attack
    1.04
    attacks
    0.95
     attack
    0.93
     Attacks
    0.85
     attackers
    0.83
    Attack
    0.81
    etting
    0.79
    iveness
    0.79
    pread
    0.78
    Act Density 0.023%

    No Known Activations