INDEX
Explanations
information or steps in a set of instructions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.9%
1013
+0.13
0.5%
260
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1013
+0.23
0.08
1997
+0.13
0.05
1041
+0.09
0.05
Negative Logits
<bos>
-2.41
if
-0.72
public
-0.71
assume
-0.67
if
-0.67
若
-0.64
יע
-0.63
look
-0.63
case
-0.62
be
-0.62
POSITIVE LOGITS
emphat
1.89
Juf
1.83
guarante
1.74
Augu
1.73
maneu
1.71
aen
1.70
accla
1.69
fta
1.67
inev
1.67
squa
1.66
Activations Density 0.736%