INDEX
Explanations
references to distinct phases or steps in processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.16
1.0%
376
+0.15
0.9%
115
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
268
+0.16
0.01
128
+0.15
0.01
296
+0.13
0.01
Negative Logits
§
-2.31
unnumbered
-1.97
ľĵ
-1.95
ĥ½
-1.90
·
-1.87
½
-1.80
¥
-1.80
©
-1.76
ĵ
-1.75
İ
-1.70
POSITIVE LOGITS
heet
1.88
arily
1.81
etting
1.78
ball
1.75
oon
1.71
etter
1.71
heets
1.67
etica
1.64
icle
1.62
point
1.59
Activations Density 0.017%