INDEX
Explanations
phrases emphasizing specific instructions or precautions in a recipe
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.57
2.7%
1741
+0.19
0.9%
1385
+0.14
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1385
+0.57
0.12
16
+0.19
0.10
1741
+0.14
0.02
Negative Logits
<bos>
-1.98
,
-0.61
and
-0.59
一
-0.58
1
-0.58
define
-0.57
、
-0.57
/
-0.56
2
-0.56
.
-0.55
POSITIVE LOGITS
bandung
1.56
Minang
1.41
milano
1.36
affor
1.34
jati
1.33
jaya
1.33
jawa
1.33
Manufact
1.32
lele
1.32
thut
1.30
Activations Density 0.890%