INDEX
Explanations
fragments of code or programming language syntax
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
47
+0.16
0.9%
153
+0.15
0.9%
441
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
441
+0.16
-0.00
135
+0.15
0.13
47
+0.14
0.16
Negative Logits
erville
-1.55
Brewing
-1.50
Panama
-1.47
Venezuela
-1.39
ÂĴ
-1.38
Kubernetes
-1.37
Agriculture
-1.37
agriculture
-1.36
Excell
-1.34
Plants
-1.31
POSITIVE LOGITS
↵↵
2.17
↵
2.17
2.17
↵
2.17
2.17
<|outofrange|>
2.17
↵
2.17
2.17
2.17
↵
2.17
Activations Density 5.125%