INDEX
Explanations
function calls or expressions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
162
+0.12
0.7%
368
+0.12
0.7%
250
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
305
+0.12
0.04
335
+0.12
0.04
79
+0.12
0.04
Negative Logits
Ļª
-3.03
¬
-2.90
ĨĴ
-2.80
↵↵
-2.77
↵
-2.77
-2.77
↵
-2.77
↵
-2.77
↵↵
-2.77
-2.77
POSITIVE LOGITS
blogger
1.51
*+
1.41
>()
1.36
üller
1.36
ried
1.36
RN
1.36
roid
1.34
rer
1.34
repl
1.33
ermann
1.31
Activations Density 0.121%