INDEX
Explanations
references to military activities and technology
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
297
+0.12
0.4%
368
+0.09
0.3%
1042
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
297
+0.12
0.05
368
+0.09
0.04
284
+0.09
0.05
Negative Logits
emphat
-1.05
purcha
-1.04
increa
-1.01
desir
-1.01
?...
-1.01
attemp
-1.00
inev
-1.00
laun
-0.98
reluct
-0.98
apprehen
-0.97
POSITIVE LOGITS
<bos>
0.79
Full
0.77
Full
0.76
FULL
0.75
full
0.73
full
0.72
WriteTagHelper
0.65
FULL
0.64
Fully
0.63
fully
0.62
Activations Density 0.412%