INDEX
Explanations
documentation or comments in code snippets
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.15
0.9%
118
+0.14
0.8%
11
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.15
0.03
165
+0.14
0.02
416
+0.13
0.02
Negative Logits
merits
-1.62
1998
-1.56
2006
-1.56
2008
-1.55
2004
-1.55
PBS
-1.52
2002
-1.52
2007
-1.52
2009
-1.49
Fox
-1.47
POSITIVE LOGITS
½
2.71
¿½
2.24
³
2.03
ĻĤ
1.98
↵
1.97
1.97
↵↵
1.97
č↵č↵
1.97
↵↵↵
1.97
↵
1.97
Activations Density 0.249%