INDEX
Explanations
instances of the word "all."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
176
+0.14
0.8%
281
+0.12
0.7%
165
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
281
+0.14
0.06
165
+0.12
0.05
492
+0.11
0.06
Negative Logits
¿
-2.37
±
-1.96
ŀ
-1.88
¿½
-1.85
¹
-1.85
·
-1.81
´
-1.76
¯
-1.76
©
-1.66
gni
-1.65
POSITIVE LOGITS
usions
2.03
iances
1.97
owed
1.95
owing
1.94
kinds
1.89
uded
1.85
sorts
1.85
ows
1.83
ocation
1.74
ograft
1.74
Activations Density 0.169%