INDEX
Explanations
mentions of wars and military conflicts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
517
+0.17
0.6%
406
+0.14
0.5%
1870
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
517
+0.17
0.05
406
+0.14
0.04
405
+0.11
0.04
Negative Logits
<bos>
-0.62
ConstraintMaker
-0.51
狸
-0.50
秩
-0.50
刺客
-0.50
茅
-0.48
称号
-0.48
huelga
-0.48
nakalista
-0.48
nev
-0.47
POSITIVE LOGITS
swarovski
1.26
hairc
1.25
war
1.21
Darío
1.18
WAR
1.16
unwarran
1.14
unlaw
1.14
nutella
1.13
War
1.11
War
1.11
Activations Density 0.091%