INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.36
1.6%
1741
+0.04
0.2%
1120
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.36
0.00
0
-0.04
0.00
1
-0.04
0.00
Negative Logits
भी
-0.88
াই
-0.88
لينك
-0.87
so
-0.87
не
-0.87
corruption
-0.87
,
-0.85
to
-0.85
ко
-0.85
неу
-0.85
POSITIVE LOGITS
<bos>
10.98
fta
3.36
fuf
3.34
squa
3.32
effe
3.28
encomp
3.28
ftu
3.25
guarante
3.24
desir
3.24
affor
3.22
Activations Density 0.000%
No Known Activations
This feature has no known activations.