INDEX
Explanations
code elements related to functions and return statements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
115
+0.11
0.6%
510
+0.11
0.6%
366
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
366
+0.11
0.07
342
+0.11
0.07
197
+0.11
0.06
Negative Logits
·
-1.96
Ļ
-1.73
ª
-1.60
°
-1.60
Ģ
-1.60
´
-1.60
«
-1.59
ij
-1.59
µ
-1.52
activated
-1.51
POSITIVE LOGITS
eLife
1.77
ificantly
1.76
reads
1.59
portrait
1.48
oracle
1.43
concer
1.43
hood
1.40
regards
1.40
nost
1.38
itives
1.38
Activations Density 0.558%