INDEX
Explanations
programming-related terms and keywords
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.19
1.1%
271
+0.18
1.1%
17
+0.15
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
369
+0.19
0.05
17
+0.18
0.03
436
+0.15
0.08
Negative Logits
lain
-1.46
?**
-1.41
completeness
-1.41
aland
-1.36
Guarant
-1.36
eligible
-1.35
.**
-1.34
]"
-1.34
Son
-1.30
contradiction
-1.28
POSITIVE LOGITS
ĻĤ
2.58
£
2.57
§
2.47
Ľ
2.46
Īĺ
2.40
ģ
2.38
ij
2.34
ĺ
2.31
ĭ
2.31
±
2.30
Activations Density 0.682%