INDEX
Explanations
references to the formatting or structuring functions in code
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.13
0.7%
95
+0.11
0.6%
172
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
275
+0.13
0.01
95
+0.11
0.01
67
+0.10
0.01
Negative Logits
Īĺ
-2.26
·¸
-2.21
Ĥ
-2.18
Ĥ¬
-2.10
ı
-2.09
ĭ
-2.00
¦
-1.92
±
-1.85
µ
-1.84
ĥ
-1.83
POSITIVE LOGITS
festivals
1.57
profession
1.57
ios
1.52
erals
1.48
lantern
1.43
nell
1.36
festival
1.36
nad
1.35
ings
1.35
chat
1.35
Activations Density 0.021%