INDEX
Explanations
phrases related to escalating conflicts and violent situations
New Auto-Interp
Head Attr Weights
0:0.06
1:0.03
2:0.13
3:0.07
4:0.17
5:0.05
6:0.03
7:0.03
8:0.16
9:0.14
10:0.06
11:0.02
Negative Logits
stone
-1.33
prints
-1.30
Score
-1.26
ked
-1.25
maker
-1.23
liest
-1.23
boys
-1.20
builders
-1.19
girls
-1.19
stones
-1.18
POSITIVE LOGITS
hostilities
1.72
ACTIONS
1.54
escalation
1.53
tensions
1.51
confrontation
1.50
escal
1.47
ricular
1.40
��
1.40
escalating
1.40
dialogue
1.38
Activations Density 0.009%