INDEX
Explanations
phrases related to additional information or supplementary content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.17
1.0%
489
+0.10
0.6%
430
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
215
+0.17
0.03
4
+0.10
0.02
130
+0.10
0.03
Negative Logits
Ĵ
-2.40
Ļª
-2.31
Ĥ¬
-2.28
¼
-2.23
ĨĴ
-2.17
½
-1.93
Ģ
-1.86
Į
-1.85
Ļ
-1.83
º
-1.82
POSITIVE LOGITS
finding
1.50
edia
1.46
enberg
1.45
suppose
1.39
idegger
1.31
ype
1.30
ferential
1.27
yler
1.25
black
1.24
coming
1.22
Activations Density 0.083%