INDEX
Explanations
terms related to aggression, specifically instances of the word "aggressive" with varying intensities
references to aggressive behavior or characteristics
New Auto-Interp
Negative Logits
ĸļ
-0.94
obyl
-0.85
owship
-0.83
psons
-0.80
udder
-0.77
Alive
-0.75
ãĤ´ãĥ³
-0.72
mingham
-0.71
artifacts
-0.71
Vert
-0.70
POSITIVE LOGITS
aggressive
0.90
toward
0.84
aggressively
0.80
posture
0.79
aggress
0.79
behavior
0.78
aggression
0.77
tactics
0.76
ressive
0.76
towards
0.76
Activations Density 0.052%