INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.33
1.3%
1741
+0.07
0.3%
1870
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.33
0.00
0
-0.07
0.00
1
-0.05
0.00
Negative Logits
etti
-0.84
ে
-0.83
so
-0.83
as
-0.82
instead
-0.81
als
-0.80
in
-0.80
는
-0.80
া
-0.79
even
-0.78
POSITIVE LOGITS
<bos>
10.82
encomp
3.77
intersper
3.65
increa
3.59
affor
3.57
guarante
3.56
fuf
3.51
maneu
3.49
perfet
3.42
uninten
3.42
Activations Density 0.000%
No Known Activations
This feature has no known activations.