INDEX
Explanations
phrases related to killing or lethal actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
765
+0.15
0.6%
1222
+0.13
0.5%
130
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
765
+0.15
0.05
130
+0.13
0.05
1479
+0.12
0.03
Negative Logits
solidar
-0.63
stoff
-0.61
indeb
-0.57
Bourgoin
-0.54
unie
-0.53
impon
-0.53
cepan
-0.52
fras
-0.52
emoc
-0.52
zó
-0.50
POSITIVE LOGITS
kill
1.17
killing
1.09
Kill
1.07
killed
1.04
Kill
1.04
kills
1.02
kill
1.00
Killing
0.99
KILL
0.99
killing
0.94
Activations Density 0.130%