INDEX
Explanations
words related to physical violent actions or events
references to violent attacks
New Auto-Interp
Negative Logits
ETA
-0.73
dit
-0.72
YC
-0.71
zl
-0.67
ãĤ©
-0.66
theless
-0.65
Genie
-0.63
FORMATION
-0.62
Juven
-0.62
UTION
-0.60
POSITIVE LOGITS
attacks
0.86
iveness
0.85
perpetrated
0.85
against
0.85
ivist
0.84
spree
0.83
attack
0.81
attack
0.81
abad
0.80
ivity
0.79
Activations Density 0.036%