INDEX
    Explanations

    phrases related to being physically impacted or attacked, often with negative outcomes

    phrases indicating actions or events that are associated with being affected or impacted

    New Auto-Interp
    Negative Logits
    spection
    -0.80
    taboola
    -0.80
    nces
    -0.73
    ĨĴ
    -0.72
    SPONSORED
    -0.70
    ruary
    -0.69
    theless
    -0.68
    sylv
    -0.67
    iltr
    -0.67
    inois
    -0.66
    POSITIVE LOGITS
    ritic
    0.80
     tails
    0.70
     lightning
    0.69
     snag
    0.69
    henko
    0.67
     runoff
    0.66
     plateau
    0.65
     stride
    0.65
    crazy
    0.65
     missiles
    0.65
    Act Density 0.468%

    No Known Activations