INDEX
Explanations
references to violent incidents and casualties
New Auto-Interp
Negative Logits
raid
-0.16
destruct
-0.16
çĹ
-0.15
ENCH
-0.15
destructive
-0.15
REAK
-0.15
quare
-0.14
usta
-0.14
Destruction
-0.14
.managed
-0.14
POSITIVE LOGITS
killed
0.35
Killed
0.26
electro
0.26
gun
0.24
shot
0.23
-k
0.22
sense
0.21
critically
0.21
pronounced
0.21
ki
0.21
Activations Density 0.111%