INDEX
Explanations
mentions of "points" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.15
0.9%
376
+0.14
0.8%
255
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
255
+0.15
0.01
286
+0.14
0.01
260
+0.12
0.01
Negative Logits
ĥ½
-3.20
§
-2.81
Ĥ
-2.67
©
-2.63
¿½
-2.61
Ĥ¬
-2.59
¦
-2.56
ľ
-2.52
·
-2.52
ĥ
-2.42
POSITIVE LOGITS
heet
2.33
ional
1.99
erve
1.97
ionale
1.88
hips
1.85
heets
1.85
ior
1.85
agem
1.83
avia
1.80
etable
1.79
Activations Density 0.013%