INDEX
Explanations
percentages mentioned in text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1334
+0.12
0.4%
1323
+0.12
0.4%
168
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
75
+0.12
0.02
168
+0.12
0.02
971
+0.11
0.02
Negative Logits
reluct
-0.97
inev
-0.96
impra
-0.96
secon
-0.95
lyon
-0.95
Daven
-0.94
embra
-0.94
madonna
-0.94
vagab
-0.94
intermitt
-0.94
POSITIVE LOGITS
percentage
1.24
percentage
1.05
percent
0.99
percentages
0.94
Percentage
0.92
Percentage
0.86
percent
0.81
\%
0.78
%
0.78
%
0.78
Activations Density 0.054%