INDEX
Explanations
words related to hostile actions or behaviors
instances of the word "aggression" and its related forms
New Auto-Interp
Negative Logits
aver
-0.84
abol
-0.74
lev
-0.73
NCT
-0.72
ectar
-0.70
âĢ¢âĢ¢
-0.70
FORMATION
-0.69
view
-0.67
rote
-0.67
orah
-0.66
POSITIVE LOGITS
aggression
1.36
aggress
1.10
provocation
1.01
Agg
0.93
escalation
0.86
aggressively
0.82
iveness
0.80
towards
0.80
retaliation
0.77
escalate
0.77
Activations Density 0.015%