INDEX
Explanations
structured data representations, particularly lists and arrays
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.17
1.0%
412
+0.15
0.9%
489
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
412
+0.17
0.03
17
+0.15
0.01
41
+0.14
0.02
Negative Logits
ĵ
-1.99
pires
-1.90
§
-1.63
ļ
-1.62
µ
-1.57
ľĵ
-1.54
¤
-1.53
IJ
-1.47
protobuf
-1.46
role
-1.42
POSITIVE LOGITS
ening
2.22
horn
1.92
Items
1.85
eners
1.84
orio
1.70
ured
1.70
ener
1.67
etable
1.66
Caption
1.62
item
1.60
Activations Density 0.084%