INDEX
Explanations
references to authors and author metadata in documents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
407
+0.13
0.7%
410
+0.12
0.6%
443
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
163
+0.13
0.02
410
+0.12
0.03
174
+0.11
0.03
Negative Logits
Ł
-1.49
↵
-1.49
↵ Âł
-1.49
-1.49
↵
-1.49
-1.49
-1.49
↵
-1.49
-1.49
↵
-1.49
POSITIVE LOGITS
itatively
2.32
ization
2.31
izations
2.20
ities
2.15
esses
2.06
izes
2.05
ised
1.99
iousness
1.85
itative
1.84
isation
1.83
Activations Density 0.163%