INDEX
Explanations
names that likely belong to characters in a movie or a TV show
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.11
0.4%
1654
+0.10
0.4%
489
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.11
0.05
227
+0.10
0.04
1056
+0.08
0.04
Negative Logits
<bos>
-2.23
guang
-0.98
qiao
-0.84
xiu
-0.84
qian
-0.75
huo
-0.73
HideFlags
-0.72
yao
-0.71
xun
-0.71
luo
-0.70
POSITIVE LOGITS
soulign
1.25
véhic
1.21
fameux
1.13
accla
1.13
unspeak
1.09
dénon
1.04
eiffel
1.03
Mejía
1.03
zove
1.02
vété
1.01
Activations Density 0.176%