INDEX
Explanations
code-related terminology and structures in programming languages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
47
+0.16
0.9%
63
+0.14
0.8%
479
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
444
+0.16
0.00
63
+0.14
0.06
47
+0.13
0.14
Negative Logits
¿½
-2.05
ľĵ
-1.93
®
-1.78
¨
-1.78
¿
-1.75
ª
-1.73
º
-1.72
©
-1.67
²
-1.66
Į
-1.65
POSITIVE LOGITS
such
1.56
simplest
1.46
nerves
1.43
orate
1.38
ermine
1.37
arer
1.35
_
1.34
underlying
1.30
ples
1.28
ody
1.27
Activations Density 4.308%