INDEX
Explanations
phrases related to technical instructions or guides
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.15
0.8%
188
+0.13
0.7%
1141
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1622
+0.15
0.04
1805
+0.13
0.04
1363
+0.13
0.04
Negative Logits
<bos>
-1.51
intersper
-1.43
xxvi
-1.13
gaily
-1.11
xxii
-1.09
xxiii
-1.08
encomp
-1.07
gratify
-1.06
unspeak
-1.06
xxv
-1.05
POSITIVE LOGITS
sl
1.19
SL
1.16
SL
1.09
sl
1.09
Sl
1.07
Sl
1.03
gl
0.81
PSL
0.76
kl
0.75
FL
0.72
Activations Density 0.491%