INDEX
Explanations
potential motives and triggers for violent or controversial actions
themes related to violence and its underlying motives
New Auto-Interp
Negative Logits
anwhile
-0.56
ajor
-0.53
"!
-0.53
Marg
-0.52
lishes
-0.52
.}
-0.49
aut
-0.49
ngth
-0.48
oya
-0.48
Morning
-0.48
POSITIVE LOGITS
?,
1.14
,[
1.01
*,
1.00
(),
0.95
/,
0.86
!,
0.82
,
0.82
$,
0.81
,...
0.79
+,
0.79
Activations Density 1.228%