INDEX
Explanations
references to scale or measurements in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
1.5%
1974
+0.11
0.6%
411
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.25
0.02
1331
+0.11
0.02
411
+0.11
0.02
Negative Logits
<bos>
-2.69
<?
-0.65
-0.61
admit
-0.60
ⓧ
-0.59
put
-0.58
#
-0.56
defend
-0.56
cup
-0.56
cố
-0.54
POSITIVE LOGITS
accla
1.53
suspic
1.50
ecru
1.47
fatis
1.43
jaya
1.40
wien
1.40
effe
1.40
unwarran
1.39
bandung
1.39
nece
1.37
Activations Density 0.048%