INDEX
Explanations
book titles or references
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1964
+0.13
0.5%
1053
+0.13
0.5%
805
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
499
+0.13
0.05
648
+0.13
0.04
680
+0.12
0.04
Negative Logits
increa
-0.85
fortn
-0.82
affor
-0.80
shenan
-0.77
strick
-0.76
michelin
-0.75
volunte
-0.73
attemp
-0.73
tucson
-0.72
jurassic
-0.71
POSITIVE LOGITS
<bos>
0.66
Ten
0.64
tenth
0.64
Ten
0.57
diez
0.53
ten
0.53
OneToMany
0.53
simplifié
0.52
October
0.52
ponses
0.50
Activations Density 0.175%