INDEX
Explanations
specific mathematical notation or expressions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
170
+0.15
0.9%
94
+0.14
0.8%
230
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
306
+0.15
0.01
106
+0.14
0.01
258
+0.12
0.01
Negative Logits
println
-1.75
ils
-1.70
ths
-1.67
tti
-1.64
letons
-1.58
ween
-1.52
*.*
-1.43
*(
-1.37
eme
-1.36
eds
-1.35
POSITIVE LOGITS
©
1.97
ively
1.88
Īĺ
1.78
¸
1.64
Ļª
1.61
ĨĴ
1.60
ģ
1.41
IJ
1.40
Ķ
1.38
¿½
1.37
Activations Density 0.002%