INDEX
Explanations
phrases related to various forms of attacks and attackers
references to individuals involved in violent actions or events
New Auto-Interp
Negative Logits
zl
-0.97
umph
-0.84
Balt
-0.84
Revival
-0.76
sit
-0.75
urgical
-0.71
wheel
-0.71
rebirth
-0.69
urb
-0.69
indust
-0.67
POSITIVE LOGITS
attacker
0.86
attackers
0.85
beware
0.80
wielding
0.76
attacked
0.75
assailants
0.75
assailant
0.72
intent
0.72
who
0.71
rapist
0.71
Activations Density 0.029%