INDEX
Explanations
phrases related to subscription and engagement with media content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
453
+0.24
1.2%
50
+0.23
1.1%
1343
+0.16
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.24
0.10
453
+0.23
0.09
1689
+0.16
0.05
Negative Logits
<bos>
-2.99
,
-1.08
.
-1.02
for
-1.01
at
-1.00
的
-0.99
in
-0.99
to
-0.98
(
-0.98
of
-0.98
POSITIVE LOGITS
wien
3.35
effe
3.21
unden
3.19
dispen
3.13
increa
3.12
affor
3.11
stockholm
3.11
accla
3.08
oner
3.00
guarante
3.00
Activations Density 0.364%