INDEX
Explanations
information-related text, such as disclaimers, copyright statements, and notices
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.26
0.8%
382
+0.18
0.6%
1699
+0.17
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.26
0.08
1535
+0.18
0.06
1896
+0.17
0.04
Negative Logits
churrasco
-0.99
barbacoa
-0.94
Lma
-0.84
pimiento
-0.84
Áng
-0.83
Darío
-0.81
limestones
-0.80
jamón
-0.80
repug
-0.79
piña
-0.79
POSITIVE LOGITS
<eos>
0.94
Please
0.88
<bos>
0.84
Therefore
0.79
↵↵
0.71
You
0.68
We
0.68
For
0.67
However
0.66
It
0.65
Activations Density 0.359%