INDEX
Explanations
blocks of code or structured elements in programming languages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
179
+0.17
1.0%
126
+0.12
0.6%
385
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
179
+0.17
0.06
313
+0.12
0.04
385
+0.11
0.05
Negative Logits
acia
-1.63
representative
-1.59
diagram
-1.47
NIH
-1.46
supervisor
-1.41
summary
-1.38
aphys
-1.38
").
-1.34
acy
-1.34
).)
-1.31
POSITIVE LOGITS
»¿
1.83
Ĵ
1.72
Ĩ
1.71
³
1.66
§
1.60
IJ
1.58
ľĵ
1.58
Ł
1.53
Ĭ
1.51
Ń
1.49
Activations Density 0.261%