INDEX
Explanations
technical terminology and code structure related to software development or programming languages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
210
+0.17
0.9%
322
+0.16
0.9%
108
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
322
+0.17
0.03
370
+0.16
0.09
210
+0.12
0.02
Negative Logits
±
-2.37
·¸
-2.20
ľ
-2.07
?”
-2.06
ĻĤ
-2.06
Ķ
-2.04
Ļ
-1.97
ĺ
-1.96
°
-1.91
½
-1.90
POSITIVE LOGITS
quo
1.60
arshal
1.44
↵
1.36
↵
1.36
otherwise
1.32
otherwise
1.30
itude
1.26
etition
1.24
uality
1.23
biamo
1.23
Activations Density 1.265%