INDEX
Explanations
quantifiable statistics or proportions related to populations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.33
1.1%
1252
+0.09
0.3%
198
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1252
+0.33
0.04
1726
+0.09
0.03
125
+0.07
0.03
Negative Logits
<bos>
-1.80
utop
-0.99
anse
-0.94
solidar
-0.94
meras
-0.92
sò
-0.92
robus
-0.91
ù
-0.90
rè
-0.86
tà
-0.84
POSITIVE LOGITS
of
0.96
Of
0.69
ofthe
0.67
soulign
0.66
Putih
0.66
ของ
0.65
của
0.64
doté
0.63
fabricado
0.61
unwarran
0.61
Activations Density 0.121%