INDEX
Explanations
instances of positive feedback or praise
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
417
+0.14
0.8%
82
+0.12
0.7%
339
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
35
+0.14
0.04
202
+0.12
0.05
175
+0.12
0.04
Negative Logits
ľ
-2.10
¾
-2.04
Ŀ
-2.04
³
-1.99
º
-1.94
į
-1.92
ı
-1.84
¡
-1.82
µ
-1.79
ª
-1.78
POSITIVE LOGITS
Code
1.59
species
1.56
Species
1.50
others
1.48
sexes
1.43
","
1.38
itone
1.37
Parenthood
1.36
please
1.36
Others
1.36
Activations Density 0.033%