INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.42
3.2%
1741
+0.08
0.6%
906
+0.04
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.42
0.00
0
-0.08
0.00
1
-0.04
0.00
Negative Logits
ে
-1.00
াই
-0.98
Sự
-0.98
Đối
-0.98
لينك
-0.97
corruption
-0.97
া
-0.97
भी
-0.97
ার
-0.96
Những
-0.96
POSITIVE LOGITS
<bos>
12.59
encomp
4.03
fuf
4.02
fta
4.00
effe
3.97
guarante
3.97
squa
3.95
affor
3.90
desir
3.89
purcha
3.88
Activations Density 0.000%
No Known Activations
This feature has no known activations.