INDEX
Explanations
statistical measurements and their representations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.18
1.0%
23
+0.14
0.8%
20
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
360
+0.18
0.01
330
+0.14
0.02
382
+0.13
0.01
Negative Logits
»¿
-2.43
ı
-2.17
Ĺ
-1.98
£
-1.94
-1.93
↵
-1.93
↵
-1.93
↵
-1.93
↵↵
-1.93
↵ ³³³
-1.93
POSITIVE LOGITS
ations
1.68
literature
1.61
Jacob
1.50
custom
1.49
written
1.47
keit
1.47
threads
1.46
ality
1.41
thread
1.41
osity
1.41
Activations Density 0.096%