INDEX
Explanations
references to specific quantities or items
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.22
1.2%
59
+0.11
0.6%
77
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.22
0.04
369
+0.11
0.02
59
+0.11
0.02
Negative Logits
ending
-1.55
agraph
-1.52
actic
-1.51
thirds
-1.50
ends
-1.49
essential
-1.48
onomy
-1.47
itional
-1.46
ative
-1.43
osity
-1.42
POSITIVE LOGITS
±
2.28
¿
2.19
ĻĤ
2.18
¹
2.03
ī
2.02
½
2.02
«
2.01
Ľ
1.86
»
1.84
´
1.84
Activations Density 0.221%