INDEX
Explanations
the term "mean" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
52
+0.13
0.7%
368
+0.12
0.7%
198
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
312
+0.13
0.03
266
+0.12
0.03
116
+0.11
0.03
Negative Logits
cha
-1.78
yn
-1.69
hed
-1.69
ub
-1.60
]{.-1.53
htra
-1.51
iers
-1.48
hn
-1.48
encies
-1.45
bsd
-1.43
POSITIVE LOGITS
lights
1.64
GMT
1.58
identical
1.53
breath
1.48
suit
1.46
accompanies
1.46
glare
1.45
photographs
1.45
coma
1.43
watches
1.42
Activations Density 0.011%