INDEX
Explanations
terms related to aggressive behavior or actions
instances and discussions of aggression
New Auto-Interp
Negative Logits
FORMATION
-0.80
aver
-0.76
ectar
-0.72
Rated
-0.70
van
-0.67
rote
-0.66
Vita
-0.66
âĢ¢âĢ¢
-0.65
Mour
-0.65
clips
-0.64
POSITIVE LOGITS
aggression
1.15
aggress
0.93
iveness
0.86
provocation
0.82
towards
0.82
toward
0.81
aggressively
0.81
escalation
0.79
Agg
0.79
whine
0.75
Activations Density 0.011%