INDEX
Explanations
mentions of blank items or spaces
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1272
+0.14
0.7%
1101
+0.13
0.7%
1133
+0.13
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1805
+0.14
0.02
1363
+0.13
0.03
1101
+0.13
0.02
Negative Logits
<bos>
-1.66
Merritt
-0.57
McGuire
-0.56
s
-0.54
shed
-0.54
Khor
-0.52
McCormack
-0.52
rs
-0.51
twij
-0.51
Cormack
-0.51
POSITIVE LOGITS
Blank
1.43
Blank
1.39
BLANK
1.37
blank
1.36
Blan
1.34
Blan
1.33
blan
1.33
blank
1.32
blanks
1.26
blan
1.22
Activations Density 0.382%