INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.26
0.9%
1741
+0.06
0.2%
1870
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.26
0.00
0
-0.06
0.00
1
-0.06
0.00
Negative Logits
to
-1.02
do
-0.98
have
-0.97
a
-0.97
all
-0.96
in
-0.95
no
-0.94
an
-0.93
so
-0.93
are
-0.91
POSITIVE LOGITS
<bos>
8.21
ftu
2.06
fta
2.02
fatis
1.94
dispen
1.94
fup
1.90
paff
1.84
poff
1.83
expandindo
1.83
ftre
1.83
Activations Density 0.000%
No Known Activations
This feature has no known activations.