INDEX
Explanations
references to ordinal positions in lists or sequences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
128
+0.11
0.6%
497
+0.11
0.6%
137
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
30
+0.11
0.05
476
+0.11
0.05
497
+0.11
0.05
Negative Logits
MY
-1.48
neither
-1.47
naments
-1.44
alth
-1.41
opener
-1.41
both
-1.35
thirst
-1.31
Auth
-1.31
ursed
-1.31
aven
-1.29
POSITIVE LOGITS
ones
2.00
dozen
1.89
enez
1.63
blic
1.61
heses
1.50
\)
1.48
half
1.46
iples
1.45
()),
1.44
hand
1.43
Activations Density 0.136%