INDEX
Explanations
references to decision-making and evaluation processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
169
+0.14
0.7%
431
+0.13
0.7%
480
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
70
+0.14
0.04
81
+0.13
0.06
271
+0.10
0.06
Negative Logits
¦
-2.47
ī
-2.35
IJ
-2.21
¨
-2.14
¾
-2.12
Ĭ
-2.12
Ľ
-2.06
İ
-2.05
Ļ
-2.02
¤
-2.01
POSITIVE LOGITS
á̝
2.00
sed
1.82
áŁ
1.67
refs
1.51
EXT
1.51
á̬
1.50
$).
1.49
\\
1.49
оÐ
1.48
à³
1.46
Activations Density 3.720%