INDEX
Explanations
phrases that mention groups or quantities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
343
+0.12
0.7%
494
+0.11
0.6%
477
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
249
+0.12
0.02
335
+0.11
0.02
477
+0.10
0.02
Negative Logits
Ĥ¬
-2.52
ŀ
-2.42
¤
-2.42
Ĥ
-2.37
¹
-2.35
Ł
-2.34
ĨĴ
-2.33
º
-2.31
Ļ
-2.28
Ń
-2.24
POSITIVE LOGITS
dozen
1.88
ousand
1.76
hundred
1.76
ships
1.71
lie
1.69
ages
1.66
zet
1.66
tering
1.65
iece
1.65
frey
1.64
Activations Density 0.090%