INDEX
Explanations
metal-related terms or items mentioned in a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.09
0.4%
341
+0.08
0.3%
687
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1515
+0.09
0.03
368
+0.08
0.03
1837
+0.07
0.03
Negative Logits
<bos>
-1.50
/**
-0.90
-0.81
/*
-0.79
<?
-0.79
ⓧ
-0.79
<?
-0.67
become
-0.66
public
-0.64
///**
-0.64
POSITIVE LOGITS
Metal
2.04
metal
2.02
Metal
2.00
metal
1.91
METAL
1.84
METAL
1.72
métal
1.66
aen
1.51
metals
1.51
Metals
1.47
Activations Density 0.115%