INDEX
Explanations
function return statements in programming code
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.19
1.0%
240
+0.11
0.6%
199
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
299
+0.19
0.03
206
+0.11
0.02
113
+0.10
0.04
Negative Logits
ther
-1.81
orro
-1.81
sburg
-1.76
amon
-1.76
ero
-1.74
rug
-1.71
)\]
-1.66
ctomy
-1.63
oin
-1.62
utz
-1.62
POSITIVE LOGITS
µ
1.66
favour
1.65
quo
1.62
yes
1.62
parl
1.58
things
1.57
false
1.56
matern
1.55
clear
1.55
conventional
1.54
Activations Density 0.162%