INDEX
Explanations
phrases related to being physically impacted or attacked, often with negative outcomes
phrases indicating actions or events that are associated with being affected or impacted
New Auto-Interp
Negative Logits
spection
-0.80
taboola
-0.80
nces
-0.73
ĨĴ
-0.72
SPONSORED
-0.70
ruary
-0.69
theless
-0.68
sylv
-0.67
iltr
-0.67
inois
-0.66
POSITIVE LOGITS
ritic
0.80
tails
0.70
lightning
0.69
snag
0.69
henko
0.67
runoff
0.66
plateau
0.65
stride
0.65
crazy
0.65
missiles
0.65
Activations Density 0.468%