INDEX
    Explanations

    phrases containing the word "attacked"

    instances of the word "attacked."

    New Auto-Interp
    Negative Logits
    Vert
    -0.72
    val
    -0.65
    YC
    -0.63
    tz
    -0.63
    shown
    -0.61
    atom
    -0.61
    sa
    -0.61
     vert
    -0.61
    flu
    -0.61
    aver
    -0.61
    POSITIVE LOGITS
    attack
    1.03
     attacked
    0.94
     attacks
    0.94
     attackers
    0.89
    oise
    0.89
     attack
    0.87
    ritch
    0.86
     attacking
    0.85
    ivated
    0.82
    Attack
    0.80
    Act Density 0.015%

    No Known Activations