INDEX
Explanations
phrases related to product descriptions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
341
+0.09
0.3%
1590
+0.08
0.2%
58
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1590
+0.09
0.02
1120
+0.08
0.02
385
+0.08
0.02
Negative Logits
do
-0.79
try
-0.79
don
-0.78
can
-0.77
get
-0.77
,
-0.75
й
-0.75
<bos>
-0.73
continue
-0.73
for
-0.73
POSITIVE LOGITS
Wel
2.51
Wel
2.27
WEL
2.16
fta
2.11
fte
2.10
effe
2.08
ftu
2.02
§.
1.98
secon
1.98
applau
1.97
Activations Density 0.139%