INDEX
Explanations
references and citations in academic texts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
282
+0.14
0.8%
136
+0.13
0.7%
478
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
136
+0.14
-0.00
302
+0.13
0.02
243
+0.12
0.03
Negative Logits
ħ
-2.66
Ģ
-2.42
ĭ
-2.27
Ļª
-2.21
Ļ
-2.18
ĩ
-2.18
ŀ
-2.16
Ħ
-2.15
¬
-2.13
ģ
-2.12
POSITIVE LOGITS
){#1.82
Hence
1.78
However
1.68
Therefore
1.68
Besides
1.67
However
1.63
since
1.56
Nevertheless
1.55
Thus
1.54
Hence
1.54
Activations Density 0.153%