INDEX
Explanations
words related to sales or promotions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
554
+0.09
0.3%
1590
+0.08
0.3%
1407
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.09
0.02
1372
+0.08
0.02
1590
+0.08
0.02
Negative Logits
ATEGY
-0.62
<bos>
-0.56
ANDUM
-0.56
ocarcinoma
-0.56
േ
-0.54
AsUp
-0.54
ContentAlignment
-0.54
NOSIS
-0.54
CodeAttribute
-0.53
ുറ
-0.53
POSITIVE LOGITS
ped
2.81
Ped
2.81
Ped
2.60
ped
2.39
PED
2.23
Pedal
1.85
pedal
1.77
PED
1.76
pedal
1.75
pedals
1.67
Activations Density 0.202%