INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.41
2.3%
1741
+0.04
0.2%
921
+0.03
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.41
0.00
0
-0.04
0.00
1
-0.03
0.00
Negative Logits
ার
-0.87
भी
-0.87
ेटा
-0.86
لينك
-0.85
া
-0.84
राब
-0.82
не
-0.82
ে
-0.82
ি
-0.82
虽然
-0.82
POSITIVE LOGITS
<bos>
12.07
fuf
3.44
effe
3.38
squa
3.37
fta
3.29
secon
3.28
guarante
3.27
encomp
3.27
desir
3.26
affor
3.24
Activations Density 0.000%
No Known Activations
This feature has no known activations.