INDEX
Explanations
programming-related terms and instructions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.42
1.5%
1150
+0.13
0.4%
1804
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.42
0.08
1013
+0.13
0.08
1804
+0.12
0.06
Negative Logits
increa
-3.20
inev
-3.15
effe
-3.08
affor
-3.06
emphat
-2.99
fta
-2.98
accla
-2.98
fuf
-2.98
reluct
-2.96
unden
-2.89
POSITIVE LOGITS
<bos>
2.46
.
0.93
0.85
。
0.84
中
0.82
;
0.82
0.80
,
0.80
0.80
0.80
Activations Density 0.888%