INDEX
Explanations
words related to inciting anger or conflict
references to actions or events that incite conflict or tension
New Auto-Interp
Negative Logits
Accounting
-0.88
ummer
-0.74
haul
-0.72
aum
-0.72
rake
-0.70
amacare
-0.70
ultz
-0.66
ewater
-0.66
oard
-0.66
fficiency
-0.65
POSITIVE LOGITS
provocation
1.16
provoke
1.02
provoking
0.97
bystanders
0.88
provocative
0.88
provoked
0.83
prov
0.81
aggression
0.79
sidx
0.79
escalation
0.79
Activations Density 0.033%