INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.30
1.3%
1253
+0.07
0.3%
1870
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.30
0.00
0
-0.07
0.00
1
-0.06
0.00
Negative Logits
so
-1.53
in
-1.50
to
-1.50
,
-1.47
as
-1.46
that
-1.45
we
-1.44
was
-1.43
can
-1.43
for
-1.43
POSITIVE LOGITS
<bos>
10.93
ftu
3.06
fta
2.93
fatis
2.88
sappi
2.82
dispen
2.81
ftre
2.80
fup
2.79
squa
2.75
paff
2.73
Activations Density 0.000%
No Known Activations
This feature has no known activations.