INDEX
Explanations
references to licenses or license-related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
205
+0.14
0.8%
497
+0.12
0.7%
231
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
329
+0.14
0.01
205
+0.12
0.01
315
+0.11
0.01
Negative Logits
Īĺ
-4.38
↵
-4.23
↵
-4.23
-4.23
↵
-4.23
-4.23
-4.23
-4.23
↵ Âł
-4.23
↵
-4.23
POSITIVE LOGITS
Appeal
1.31
ôt
1.30
arser
1.28
dire
1.27
Agreement
1.25
seed
1.21
agreement
1.19
ieur
1.19
IEW
1.18
ÑĮ
1.18
Activations Density 0.060%